Javascript must be enabled to continue!
Genotype Imputation
View through CrossRef
Abstract
A missing data problem arises in genetic epidemiological studies when genotypes of particular markers are unavailable for analysis for reasons of data quality, cost efficiency or technical design. In such instances, imputation methods can be used to extend the process of scientific inference making from the typed to the un‐typed markers. The information required to infer unobserved genotypes from observed genotypes is provided by the so‐called imputation base, an internal or external set of comprehensively typed individuals (often taken from HapMap or the 1000 Genomes Project) that is representative of the study population as a whole. The most popular genotype imputation methods, including IMPUTE, fastPHASE, MaCH and BEAGLE, employ a Markov chain model of the haplotype distribution in the population of interest. Although these frameworks have been shown to provide accurate and efficient tools of ‘
in silico
genotyping’ under certain conditions, their uncritical use nevertheless must be cautioned against.
Key Concepts
Relevant genotype data may be missing in genetic epidemiological studies for technical or efficiency reasons.
Scientific inference that takes missing genotype data properly into account can be made using data imputation methods.
Genotype imputation requires an imputation base, that is, a population‐representative set of individuals (such as the HapMap or the 1000 Genomes Project) who are genotyped for all markers of interest.
Established genotype imputation methods employ a Markov chain model of the haplotype distribution in the population under study.
Genotype imputation may achieve 90% accuracy for highly polymorphic markers but performs less well for rare variants.
While genotype imputation may provide valid statistical tests of genotype–phenotype association, their use for effect size estimation and significance assessment must proceed with caution.
Genotype imputation needs to follow the same rules of good scientific practice as laboratory‐based data generation.
Title: Genotype Imputation
Description:
Abstract
A missing data problem arises in genetic epidemiological studies when genotypes of particular markers are unavailable for analysis for reasons of data quality, cost efficiency or technical design.
In such instances, imputation methods can be used to extend the process of scientific inference making from the typed to the un‐typed markers.
The information required to infer unobserved genotypes from observed genotypes is provided by the so‐called imputation base, an internal or external set of comprehensively typed individuals (often taken from HapMap or the 1000 Genomes Project) that is representative of the study population as a whole.
The most popular genotype imputation methods, including IMPUTE, fastPHASE, MaCH and BEAGLE, employ a Markov chain model of the haplotype distribution in the population of interest.
Although these frameworks have been shown to provide accurate and efficient tools of ‘
in silico
genotyping’ under certain conditions, their uncritical use nevertheless must be cautioned against.
Key Concepts
Relevant genotype data may be missing in genetic epidemiological studies for technical or efficiency reasons.
Scientific inference that takes missing genotype data properly into account can be made using data imputation methods.
Genotype imputation requires an imputation base, that is, a population‐representative set of individuals (such as the HapMap or the 1000 Genomes Project) who are genotyped for all markers of interest.
Established genotype imputation methods employ a Markov chain model of the haplotype distribution in the population under study.
Genotype imputation may achieve 90% accuracy for highly polymorphic markers but performs less well for rare variants.
While genotype imputation may provide valid statistical tests of genotype–phenotype association, their use for effect size estimation and significance assessment must proceed with caution.
Genotype imputation needs to follow the same rules of good scientific practice as laboratory‐based data generation.
Related Results
The Impact of IL28B Gene Polymorphisms on Drug Responses
The Impact of IL28B Gene Polymorphisms on Drug Responses
To achieve high therapeutic efficacy in the patient, information on pharmacokinetics, pharmacodynamics, and pharmacogenetics is required. With the development of science and techno...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...
weIMPUTE: A User-Friendly Web-Based Genotype Imputation Platform
weIMPUTE: A User-Friendly Web-Based Genotype Imputation Platform
AbstractGenotype imputation is a critical preprocessing step in genome-wide association studies (GWAS), enhancing statistical power for detecting associated single nucleotide polym...
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
AbstractLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for...
Assessment of economic and environmental impacts of two typical cotton genotypes with contrasting potassium efficiency
Assessment of economic and environmental impacts of two typical cotton genotypes with contrasting potassium efficiency
AbstractIt is essential to produce optimal crop yields while reducing adverse environmental impacts of overfertilization. Therefore, nutrient‐efficient plants may play a major role...
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...
Imputation of Spatially-resolved Transcriptomes by Graph-regularized Tensor Completion
Imputation of Spatially-resolved Transcriptomes by Graph-regularized Tensor Completion
AbstractHigh-throughput spatial-transcriptomics RNA sequencing (sptRNA-seq) based on in-situ capturing technologies has recently been developed to spatially resolve transcriptome-w...
Use of immuno-dominant epitope derived from genotype 4 as a diagnostic reagent for detecting the antibodies against Hepatitis E Virus
Use of immuno-dominant epitope derived from genotype 4 as a diagnostic reagent for detecting the antibodies against Hepatitis E Virus
Abstract
Background
Despite the genotype 4 has become the dominant cause of hepatitis E disease in China, none antigen derived from genotype 4 of...


