Javascript must be enabled to continue!
IntroUNET: identifying introgressed alleles via semantic segmentation
View through CrossRef
1
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient—ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual’s alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled “ghost” population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from
Drosophila
, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method’s success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
2
Author Summary
It is now known that a sizeable fraction of species occasionally hybridize with related species. Thus, many species harbor genetic material that traces its ancestry to closely related species. For example, many humans contain DNA that was “introgressed” from Neanderthals. The growing appreciation of the commonality of introgression has sparked a keen interest in determining which portions of the genome were introgressed. Several statistical approaches have been devised for identifying the population genetic signatures of introgression, but the most powerful techniques for this task take advantage of modern machine learning techniques. Here, we describe a deep learning method for identifying segments of introgressed DNA. This method is based on neural networks used to determine which pixels in an image belong to which type of object. By treating a matrix of genotypes from a sample of individuals from two closely related species, we can use this deep learning approach to accurately infer which portions of which genomes from the first population were introgressed from the second, and vice-versa. We show that our method, which we have released as an open-source software package, is highly accurate using a variety of simulated scenarios and a real test case from the genus Drosophila.
Title: IntroUNET: identifying introgressed alleles via semantic segmentation
Description:
1
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon.
Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage.
Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression.
Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression.
One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.
e.
introgression or no introgression).
However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient—ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome.
Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles.
Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual’s alleles were introgressed from the other population.
We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled “ghost” population, performing comparably to a supervised learning method tailored specifically to that task.
Finally, we apply this method to data from
Drosophila
, showing that it is able to accurately recover introgressed haplotypes from real data.
This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression.
Our method’s success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
2
Author Summary
It is now known that a sizeable fraction of species occasionally hybridize with related species.
Thus, many species harbor genetic material that traces its ancestry to closely related species.
For example, many humans contain DNA that was “introgressed” from Neanderthals.
The growing appreciation of the commonality of introgression has sparked a keen interest in determining which portions of the genome were introgressed.
Several statistical approaches have been devised for identifying the population genetic signatures of introgression, but the most powerful techniques for this task take advantage of modern machine learning techniques.
Here, we describe a deep learning method for identifying segments of introgressed DNA.
This method is based on neural networks used to determine which pixels in an image belong to which type of object.
By treating a matrix of genotypes from a sample of individuals from two closely related species, we can use this deep learning approach to accurately infer which portions of which genomes from the first population were introgressed from the second, and vice-versa.
We show that our method, which we have released as an open-source software package, is highly accurate using a variety of simulated scenarios and a real test case from the genus Drosophila.
Related Results
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
A Comprehensive Review of Semantic Segmentation and Instance Segmentation in Forestry: Advances, Challenges, and Applications
A Comprehensive Review of Semantic Segmentation and Instance Segmentation in Forestry: Advances, Challenges, and Applications
This article presents a succinct overview of the progress, obstacles, and uses of semantic segmentation and instance segmentation within the forestry domain. The objective of this ...
AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images
AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images
AbstractBackgroundMedical image segmentation is a fundamental task in medical image analysis and has been widely applied in multiple medical fields. The latest transformer‐based de...
Multiple surface segmentation using novel deep learning and graph based methods
Multiple surface segmentation using novel deep learning and graph based methods
<p>The task of automatically segmenting 3-D surfaces representing object boundaries is important in quantitative analysis of volumetric images, which plays a vital role in nu...
Kinematics Analysis and Trajectory Planning of Segmentation Robot for Chilled Sheep Carcass
Kinematics Analysis and Trajectory Planning of Segmentation Robot for Chilled Sheep Carcass
HighlightsAn automatic sheep segmentation robot system was developed to realize the automatic segmentation of chilled sheep carcass and improve the segmentation efficiency.The mech...
Detail Guided Multilateral Segmentation Network for Real-Time Semantic Segmentation
Detail Guided Multilateral Segmentation Network for Real-Time Semantic Segmentation
With the development of unmanned vehicles and other technologies, the technical demand for scene semantic segmentation is more and more intense. Semantic segmentation requires not ...
A Panoramic Segmentation Network for Point Cloud
A Panoramic Segmentation Network for Point Cloud
Abstract
Scene segmentation mainly consists of semantic segmentation and instance segmentation. The latest research points out that combining the two segmentation me...
Retinal vessel segmentation driven by structure prior tokens
Retinal vessel segmentation driven by structure prior tokens
Abstract
Background
Accurate retinal vessel segmentation from Optical Coherence Tomography Angiography (OCTA) images is v...

