Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

vcfdist: Accurately benchmarking phased small variant calls in human genomes

View through CrossRef
Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool “vcfdist” and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased “Truth Challenge V2” submissions and show that vcfdist improves measured (SNP, INDEL) performance consistency across variant representations from R 2 = (0.14542, 0.97243) for baseline vcfeval to (0.99999, 0.99996) for vcfdist.
Title: vcfdist: Accurately benchmarking phased small variant calls in human genomes
Description:
Abstract Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing.
In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines.
We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard.
Next, we present our tool “vcfdist” and demonstrate the importance of enforcing local phasing for evaluation accuracy.
We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants.
Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance.
We evaluate the performance of 64 phased “Truth Challenge V2” submissions and show that vcfdist improves measured (SNP, INDEL) performance consistency across variant representations from R 2 = (0.
14542, 0.
97243) for baseline vcfeval to (0.
99999, 0.
99996) for vcfdist.

Related Results

An optimisational model of benchmarking
An optimisational model of benchmarking
PurposeThe purpose of this paper is to develop a quantitative methodology for benchmarking process which is simple, effective and efficient as a rejoinder to benchmarking detractor...
A review on benchmarking of supply chain performance measures
A review on benchmarking of supply chain performance measures
PurposeThe purpose of this paper is to redress the imbalances in the past literature of supply chain benchmarking and enhance data envelopment analysis (DEA) modeling approach in s...
The need for adaptive processes of benchmarking in small business‐to‐business services
The need for adaptive processes of benchmarking in small business‐to‐business services
PurposeThis paper aims to explore current management attitudes towards benchmarking and its implementation within small business‐to‐business service firms in order to enhance a dee...
Animal Alarm Calls
Animal Alarm Calls
Alarm calls are broadly defined as calls occurring in a predator context. Alarm calls have been the subject of intense scrutiny in animal communication research, as they are releva...
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract Introduction Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
Frequency and Diversity of Variant Philadelphia Chromosome In Chronic Myeloid Leukemia Patients
Frequency and Diversity of Variant Philadelphia Chromosome In Chronic Myeloid Leukemia Patients
Abstract Abstract 4903 The Philadelphia chromosome (Ph), t(9;22), is detected in around 90% of the chronic myeloid leukemia (CML) patients, but in the...

Back to Top