Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Evaluation of Model Fit of Inferred Admixture Proportions

View through CrossRef
Abstract Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In case of a bad fit, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and next generation sequencing data.
Title: Evaluation of Model Fit of Inferred Admixture Proportions
Description:
Abstract Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure.
The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data.
One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event.
The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading.
We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model.
When the model assumptions are not violated, the residuals from a pair of individuals are not correlated.
In case of a bad fit, individuals with similar demographic histories have a positive correlation of their residuals.
Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral populations.
We have implemented the method as an open source software that can be applied to both unphased genotypes and next generation sequencing data.

Related Results

Inference of recent admixture using genotype data
Inference of recent admixture using genotype data
Abstract The inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Misinference of BGA can have profound unwanted consequences for inve...
The signal of admixture can decay rapidly when using clustering-based methods
The signal of admixture can decay rapidly when using clustering-based methods
Gene flow shapes evolutionary trajectories by introducing novel alleles, facilitating or retarding adaptation, or eroding divergence among populations. Studies commonly infer gene ...
Genetic ancestry, admixture, and population structure in rural Dominica
Genetic ancestry, admixture, and population structure in rural Dominica
The Caribbean is a genetically diverse region with heterogeneous admixture compositions influenced by local island ecologies, migrations, colonial conflicts, and demographic histor...
Loter: A software package to infer local ancestry for a wide range of species
Loter: A software package to infer local ancestry for a wide range of species
Abstract Admixture between populations provides opportunity to study biological adaptation and phenotypic variation. Admixture studies rely on lo...
Revealing the range of maximum likelihood estimates in the admixture model
Revealing the range of maximum likelihood estimates in the admixture model
Abstract Many ancestry inference tools, including STRUCTURE and ADMIXTURE, rely on the admixture model to infer both, allele frequencies ...
Estimating admixture pedigrees of recent hybrids without a contiguous reference genome
Estimating admixture pedigrees of recent hybrids without a contiguous reference genome
Abstract The genome of recently admixed individuals or hybrids have characteristic genetic patterns that can be used to learn about their recent ...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...

Back to Top