Javascript must be enabled to continue!

How Imputation Can Mitigate Ascertainment Bias

Abstract Background Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of the SNPs included in the used genotyping arrays. The resulting bias relative to whole genome sequencing (WGS) data is known as SNP ascertainment bias. Correction for this bias requires detailed knowledge of the array design process which is often not available in practice. This study intends to investigate an alternative approach to mitigate ascertainment bias of a large set of genotyped individuals by using information of a small set of sequenced individuals via imputation without the need for prior knowledge on the array design. Results The strategy was first tested by simulating additional ascertainment bias with a set of 1,566 chickens from 74 populations that were genotyped for the positions of the Affymetrix Axiom™ 580k Genome-Wide Chicken Array. Imputation accuracy was shown to be consistently higher for populations used for SNP discovery during the simulated array design process. Reference sets of at least one individual per population in the study set led to a strong correction of ascertainment bias for estimates of expected and observed heterozygosity, Wrights Fixation Index and Nei’s Standard Genetic Distance. In contrast, unbalanced reference sets introduced a new bias towards the reference populations. Finally, the array genotypes were imputed to WGS by utilization of reference sets of 74 individuals (one per population) to 98 individuals (additional commercial chickens) and compared with a mixture of individually and pooled sequenced populations. The imputation reduced the slope between heterozygosity estimates of array data and WGS data from 1.94 to 1.26 when using the smaller balanced reference panel and to 1.44 when using the larger but unbalanced reference panel. This generally supported the results from simulation but was less favorable, advocating for a larger reference panel when imputing to WGS. Conclusions The results highlight the potential of using imputation for mitigation of SNP ascertainment bias but also underline the need for unbiased reference sets.

Research Square Platform LLC

Johannes Geibel Christian Reimer Torsten Pook Steffen Weigend Annett Weigend Henner Simianer

2021

Title: How Imputation Can Mitigate Ascertainment Bias

Description:

Abstract Background Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of the SNPs included in the used genotyping arrays.

The resulting bias relative to whole genome sequencing (WGS) data is known as SNP ascertainment bias.

Correction for this bias requires detailed knowledge of the array design process which is often not available in practice.

This study intends to investigate an alternative approach to mitigate ascertainment bias of a large set of genotyped individuals by using information of a small set of sequenced individuals via imputation without the need for prior knowledge on the array design.

Results The strategy was first tested by simulating additional ascertainment bias with a set of 1,566 chickens from 74 populations that were genotyped for the positions of the Affymetrix Axiom™ 580k Genome-Wide Chicken Array.

Imputation accuracy was shown to be consistently higher for populations used for SNP discovery during the simulated array design process.

Reference sets of at least one individual per population in the study set led to a strong correction of ascertainment bias for estimates of expected and observed heterozygosity, Wrights Fixation Index and Nei’s Standard Genetic Distance.

In contrast, unbalanced reference sets introduced a new bias towards the reference populations.

Finally, the array genotypes were imputed to WGS by utilization of reference sets of 74 individuals (one per population) to 98 individuals (additional commercial chickens) and compared with a mixture of individually and pooled sequenced populations.

The imputation reduced the slope between heterozygosity estimates of array data and WGS data from 1.

94 to 1.

26 when using the smaller balanced reference panel and to 1.

44 when using the larger but unbalanced reference panel.

This generally supported the results from simulation but was less favorable, advocating for a larger reference panel when imputing to WGS.

Conclusions The results highlight the potential of using imputation for mitigation of SNP ascertainment bias but also underline the need for unbiased reference sets.

Back

Abstract Background Population genetic studies based on genotyped single nucleotide polymorphisms (SNPs) are influenced by a non-random selection of...

Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling

Abstract Background For assembling large whole-genome sequence datasets to be used routinely in research and breeding, the sequ...

Genotype Imputation

Abstract A missing data problem arises in genetic epidemiological studies when genotypes of particular markers are unavailable fo...

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

Abstract Left-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Imp...

A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...

weIMPUTE: A User-Friendly Web-Based Genotype Imputation Platform

Abstract Genotype imputation is a critical preprocessing step in genome-wide association studies (GWAS), enhancing statistical power for detecting associated single...

Imputation of Spatially-resolved Transcriptomes by Graph-regularized Tensor Completion

Abstract High-throughput spatial-transcriptomics RNA sequencing (sptRNA-seq) based on in-situ capturing technologies has recently been developed ...

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

Abstract Background The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...

Email:
Password:

Email:

How Imputation Can Mitigate Ascertainment Bias

Related Results