Javascript must be enabled to continue!
Smart variant filtering
View through CrossRef
Variant filtering consists of preserving highly confident variants and removing falsely called variants. Secondary genomic DNA analysis is mainly oriented toward alignment and variant calling because these two processes strongly influence overall quality. Previously, the variant filtering step was mostly overlooked or analyzed only in deeper testing. However, variant filtering can boost precision of variant calls substantially.
Here, we created a Smart Variant Filtering (SVF) framework. Conceptually, the SVF framework has three phases: (i) selecting a locally-optimal machine learning algorithm configuration for the Genome In A Bottle variant-called samples (HG001-HG005); (ii) learning parameters for that configuration with a training set; (iii) using learned parameters to perform variant filtering on novel datasets. SVF is available on Github (https://github.com/sbg/smart-variant-filtering) and also as a Public project (https://igor.sbgenomics.com/u/sevenbridges/smart-variant-filtering) on the Seven Bridges Platform. It is open-sourced and free to use by any party (BSD-3 license).
Phase (i) included brute-force testing across 372 different algorithm and parameter configurations. It included 10-fold, automatitized cross-validation using 123,000 variants. Based on these results we selected a locally optimal classifier and configuration (Multi Layer Perceptron with 250 nodes in the hidden layer). Phase (ii) trained the network selected in the prior phase with 25 million variants to learn the network weights (model) to be applied in Phase (iii).
We will show results from deep, 3-stage testing to demonstrate that SVF outperforms standard variant filtering solutions currently used within most secondary DNA analyses. Smart Variant Filtering increases the precision of called SNVs (removes false positives) for up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. Indel precision is increased by up to 7.8%, while the f-score increase is in range of 0.1 to 3.2%.
Title: Smart variant filtering
Description:
Variant filtering consists of preserving highly confident variants and removing falsely called variants.
Secondary genomic DNA analysis is mainly oriented toward alignment and variant calling because these two processes strongly influence overall quality.
Previously, the variant filtering step was mostly overlooked or analyzed only in deeper testing.
However, variant filtering can boost precision of variant calls substantially.
Here, we created a Smart Variant Filtering (SVF) framework.
Conceptually, the SVF framework has three phases: (i) selecting a locally-optimal machine learning algorithm configuration for the Genome In A Bottle variant-called samples (HG001-HG005); (ii) learning parameters for that configuration with a training set; (iii) using learned parameters to perform variant filtering on novel datasets.
SVF is available on Github (https://github.
com/sbg/smart-variant-filtering) and also as a Public project (https://igor.
sbgenomics.
com/u/sevenbridges/smart-variant-filtering) on the Seven Bridges Platform.
It is open-sourced and free to use by any party (BSD-3 license).
Phase (i) included brute-force testing across 372 different algorithm and parameter configurations.
It included 10-fold, automatitized cross-validation using 123,000 variants.
Based on these results we selected a locally optimal classifier and configuration (Multi Layer Perceptron with 250 nodes in the hidden layer).
Phase (ii) trained the network selected in the prior phase with 25 million variants to learn the network weights (model) to be applied in Phase (iii).
We will show results from deep, 3-stage testing to demonstrate that SVF outperforms standard variant filtering solutions currently used within most secondary DNA analyses.
Smart Variant Filtering increases the precision of called SNVs (removes false positives) for up to 0.
2% while keeping the overall f-score higher by 0.
12-0.
27% than in existing solutions.
Indel precision is increased by up to 7.
8%, while the f-score increase is in range of 0.
1 to 3.
2%.
Related Results
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
An adaptive spatiotemporal filtering method for GNSS coordinate time series in CMONOC
An adaptive spatiotemporal filtering method for GNSS coordinate time series in CMONOC
Abstract
Common mode errors (CMEs) are a persistent challenge in regional GNSS coordinate time series, becoming more difficult to extract as distance increases. Thi...
Teachers’ Perceived Factors of Deviant Behavior among Secondary School Students in Kwara State: Implication for Educational Managers
Teachers’ Perceived Factors of Deviant Behavior among Secondary School Students in Kwara State: Implication for Educational Managers
<p><span style="font-family: TimesNewRomanPSMT; font-size: 9pt; color: #231f20; font-style: normal; font-variant: normal;">This study investigates students’ deviant beh...
Formal validation of variant classification rules using domain-specific language and meta-predicates
Formal validation of variant classification rules using domain-specific language and meta-predicates
The classification and curation of genetic variants is a critical step in both clinical genomics and biomedical research. Variant interpretation algorithms, whether rule-based or m...
EVALUATION OF HYBRID MOVIE RECOMMENDATION SYSTEM BASED ON NEURAL NETWORKS
EVALUATION OF HYBRID MOVIE RECOMMENDATION SYSTEM BASED ON NEURAL NETWORKS
Abstract: Recommendation systems are becoming increasingly important with the growth of streaming platforms. The purpose of this study is to compare the performance of Content-Base...
Demographic characteristics of SARS-CoV-2 B.1.617.2 (Delta) variant infections in Indian population
Demographic characteristics of SARS-CoV-2 B.1.617.2 (Delta) variant infections in Indian population
Abstract
Importance
Higher risks of contracting infection, developing severe illness and mortality are known facts in aged and ...
Filtering forbidden content
Filtering forbidden content
The relevance of this study lies in the need to filter content with high accuracy due to the creation of optimal variations of neural network architectures. The solutions available...

