Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

The Validity of the Coalescent Approximation for Large Samples

View through CrossRef
Abstract The Kingman coalescent, widely used in genetics, is known to be a good approximation when the sample size is small relative to the population size. In this article, we investigate how large the sample size can get without violating the coalescent approximation. If the haploid population size is 2 N , we prove that for samples of size N 1/3− ϵ , ϵ > 0, coalescence under the Wright-Fisher (WF) model converges in probability to the Kingman coalescent in the limit of large N . For samples of size N 2/5− ϵ or smaller, the WF coalescent converges to a mixture of the Kingman coalescent and what we call the mod-2 coalescent. For samples of size N 1/2 or larger, triple collisions in the WF genealogy of the sample become important. The sample size for which the probability of conformance with the Kingman coalescent is 95% is found to be 1.47 × N 0.31 for N ∈ [10 3 , 10 5 ], showing the pertinence of the asymptotic theory. The probability of no triple collisions is found to be 95% for sample sizes equal to 0.92 × N 0.49 , which too is in accord with the asymptotic theory. Varying population sizes are handled using algorithms that calculate the probability of WF coalescence agreeing with the Kingman model or taking place without triple collisions. For a sample of size 100, the probabilities of coalescence according to the Kingman model are 2%, 0%, 1%, and 0% in four models of human population with constant N , constant N except for two bottlenecks, recent exponential growth, and increasing recent exponential growth, respectively. For the same four demographic models and the same sample size, the probabilities of coalescence with no triple collision are 92%, 73%, 88%, and 87%, respectively. Visualizations of the algorithm show that even distant bottlenecks can impede agreement between the coalescent and the WF model. Finally, we prove that the WF sample frequency spectrum for samples of size N 1/3− ϵ or smaller converges to the classical answer for the coalescent.
Title: The Validity of the Coalescent Approximation for Large Samples
Description:
Abstract The Kingman coalescent, widely used in genetics, is known to be a good approximation when the sample size is small relative to the population size.
In this article, we investigate how large the sample size can get without violating the coalescent approximation.
If the haploid population size is 2 N , we prove that for samples of size N 1/3− ϵ , ϵ > 0, coalescence under the Wright-Fisher (WF) model converges in probability to the Kingman coalescent in the limit of large N .
For samples of size N 2/5− ϵ or smaller, the WF coalescent converges to a mixture of the Kingman coalescent and what we call the mod-2 coalescent.
For samples of size N 1/2 or larger, triple collisions in the WF genealogy of the sample become important.
The sample size for which the probability of conformance with the Kingman coalescent is 95% is found to be 1.
47 × N 0.
31 for N ∈ [10 3 , 10 5 ], showing the pertinence of the asymptotic theory.
The probability of no triple collisions is found to be 95% for sample sizes equal to 0.
92 × N 0.
49 , which too is in accord with the asymptotic theory.
Varying population sizes are handled using algorithms that calculate the probability of WF coalescence agreeing with the Kingman model or taking place without triple collisions.
For a sample of size 100, the probabilities of coalescence according to the Kingman model are 2%, 0%, 1%, and 0% in four models of human population with constant N , constant N except for two bottlenecks, recent exponential growth, and increasing recent exponential growth, respectively.
For the same four demographic models and the same sample size, the probabilities of coalescence with no triple collision are 92%, 73%, 88%, and 87%, respectively.
Visualizations of the algorithm show that even distant bottlenecks can impede agreement between the coalescent and the WF model.
Finally, we prove that the WF sample frequency spectrum for samples of size N 1/3− ϵ or smaller converges to the classical answer for the coalescent.

Related Results

Robust Design for Coalescent Model Inference
Robust Design for Coalescent Model Inference
Abstract —The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. Th...
Learning Theory and Approximation
Learning Theory and Approximation
The workshop Learning Theory and Approximation , organised by Kurt Jetter (Stuttgart-Hohenheim), Steve Smale (Berkeley) and Ding-Xuan Zhou (...
Linkage Analysis and Coalescents
Linkage Analysis and Coalescents
Abstract I’he number of chapters in this volume, and in the research literature generally, that discuss the coalescent attests to the importance of this concept, bot...
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Abstract Introduction Echinococcosis, caused by tapeworms of the Echinococcus genus, remains a significant zoonotic disease globally. The disease is particularly prevalent in areas...
Analysis of fracture problems of airport pavement by improved element-free Galerkin method
Analysis of fracture problems of airport pavement by improved element-free Galerkin method
Using the improved element-free Galerkin (IEFG) method, in this paper we introduce the characteristic parameter r which can reflect the singular stress near the crack tip into the ...
British Food Journal Volume 36 Issue 11 1934
British Food Journal Volume 36 Issue 11 1934
During the year the appointments of 32 Public Analysts were approved. The number of samples of food analysed by Public Analysts during the year 1933 was 138, 171, a slight increase...
The Coalescent With Gene Conversion
The Coalescent With Gene Conversion
Abstract In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results...
Generalized Jacobi Chebyshev Wavelet Approximation
Generalized Jacobi Chebyshev Wavelet Approximation
General Background: Wavelet approximations are fundamental in numerical analysis and signal processing, with classical orthogonal polynomials like Jacobi and Chebyshev serving as k...

Back to Top