Javascript must be enabled to continue!
Robust Design for Coalescent Model Inference
View through CrossRef
Abstract
—The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. The estimation of population size changes from genealogies that are reconstructed from these sequence samples, is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature devoted to coalescent inference methodology, there is surprisingly little work on experimental design. The research that does exist is largely simulation based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling for the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. ‘Robust’ means that the total and maximum uncertainty of our estimates are minimised, and are also insensitive to their unknown (true) parameter values. Given its persistence among models, this formally derived two-point theorem may form the basis of an experimental design paradigm for coalescent inference.
Title: Robust Design for Coalescent Model Inference
Description:
Abstract
—The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population.
The estimation of population size changes from genealogies that are reconstructed from these sequence samples, is an important problem in many biological fields.
Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated.
Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters.
While there is an extensive literature devoted to coalescent inference methodology, there is surprisingly little work on experimental design.
The research that does exist is largely simulation based, precluding the development of provable or general design theorems.
We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling for the structured coalescent model, and time discretisation for sequentially Markovian coalescent models.
In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.
g.
population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust.
‘Robust’ means that the total and maximum uncertainty of our estimates are minimised, and are also insensitive to their unknown (true) parameter values.
Given its persistence among models, this formally derived two-point theorem may form the basis of an experimental design paradigm for coalescent inference.
Related Results
The Validity of the Coalescent Approximation for Large Samples
The Validity of the Coalescent Approximation for Large Samples
Abstract
The Kingman coalescent, widely used in genetics, is known to be a good approximation when the sample size is small relative to the popul...
Likelihood of social-ecological genetic model
Likelihood of social-ecological genetic model
Abstract
The genetic structure of populations depends on two parallel processes - genetic and social-ecological - providing mutual information. Models that describe...
Linkage Analysis and Coalescents
Linkage Analysis and Coalescents
Abstract
I’he number of chapters in this volume, and in the research literature generally, that discuss the coalescent attests to the importance of this concept, bot...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
The Coalescent With Gene Conversion
The Coalescent With Gene Conversion
Abstract
In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results...
Screening Deep Learning Inference Accelerators at the Production Lines
Screening Deep Learning Inference Accelerators at the Production Lines
Artificial Intelligence (AI) accelerators can be divided into two main buckets, one for training and another for inference over the trained models. Computation results of AI infere...
On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo
On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo
Abstract
For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis c...

