Javascript must be enabled to continue!
Weighted minimizer sampling improves long read mapping
View through CrossRef
Abstract
Motivation
In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison. This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time. A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer. However, because the
k
-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.g., Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome in order to avoid excessive false positives. By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.
Results
We introduce a novel weighted-minimizer sampling algorithm. A unique feature of the proposed algorithm is that it performs minimizer sampling while taking into account a weight for each
k
-mer; i.e, the higher the weight of a
k
-mer, the more likely it is to be selected. By down-weighting frequently occurring
k
-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches, and (ii) maintain the minimizer match guarantee. We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2. Our results demonstrate a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome (154.3 Mbp), and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes.
Contact
adam.phillippy@nih.gov
Availability
Winnowmap is built on top of the Minimap2 codebase (Li, 2018) and is available at
https://github.com/marbl/winnowmap
.
Title: Weighted minimizer sampling improves long read mapping
Description:
Abstract
Motivation
In this era of exponential data growth, minimizer sampling has become a standard algorithmic technique for rapid genome sequence comparison.
This technique yields a sub-linear representation of sequences, enabling their comparison in reduced space and time.
A key property of the minimizer technique is that if two sequences share a substring of a specified length, then they can be guaranteed to have a matching minimizer.
However, because the
k
-mer distribution in eukaryotic genomes is highly uneven, minimizer-based tools (e.
g.
, Minimap2, Mashmap) opt to discard the most frequently occurring minimizers from the genome in order to avoid excessive false positives.
By doing so, the underlying guarantee is lost and accuracy is reduced in repetitive genomic regions.
Results
We introduce a novel weighted-minimizer sampling algorithm.
A unique feature of the proposed algorithm is that it performs minimizer sampling while taking into account a weight for each
k
-mer; i.
e, the higher the weight of a
k
-mer, the more likely it is to be selected.
By down-weighting frequently occurring
k
-mers, we are able to meet both objectives: (i) avoid excessive false-positive matches, and (ii) maintain the minimizer match guarantee.
We tested our algorithm, Winnowmap, using both simulated and real long-read data and compared it to a state-of-the-art long read mapper, Minimap2.
Our results demonstrate a reduction in the mapping error-rate from 0.
14% to 0.
06% in the recently finished human X chromosome (154.
3 Mbp), and from 3.
6% to 0% within the highly repetitive X centromere (3.
1 Mbp).
Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes.
Contact
adam.
phillippy@nih.
gov
Availability
Winnowmap is built on top of the Minimap2 codebase (Li, 2018) and is available at
https://github.
com/marbl/winnowmap
.
Related Results
10-minimizers: a promising class of constant-space minimizers
10-minimizers: a promising class of constant-space minimizers
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of siz...
Minimizer-space de Bruijn graphs
Minimizer-space de Bruijn graphs
Abstract
DNA sequencing data continues to progress towards longer reads with increasingly lower sequencing error rates. We focus on the problem o...
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED] Keanu Reeves CBD Gummies v1
[RETRACTED]Keanu Reeves CBD Gummies ==❱❱ Huge Discounts:[HURRY UP ] Absolute Keanu Reeves CBD Gummies (Available)Order Online Only!! ❰❰= https://www.facebook.com/Keanu-Reeves-CBD-G...
Generating minimum-density minimizers
Generating minimum-density minimizers
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of siz...
Mapping workflow trends in pulsed-field ablation procedures: an international glimpse
Mapping workflow trends in pulsed-field ablation procedures: an international glimpse
Abstract
Background
As pulsed field ablation (PFA) is increasingly used in the EP lab, the use of mapping, fluoroscopy, and intr...
Variable MR findings in ovarian functional hemorrhagic cysts
Variable MR findings in ovarian functional hemorrhagic cysts
AbstractPurposeTo describe the magnetic resonance (MR) findings in ovarian functional hemorrhagic cysts (FHC).Materials and MethodsA total of 21 patients with 22 FHC, proven by fol...
Comparison of LA and PVC mapping using OCTARAY and OPTRELL catheters
Comparison of LA and PVC mapping using OCTARAY and OPTRELL catheters
Abstract
Background
Multielectrode mapping catheters, such as the OCTARAY and OPTRELL, are essential in creating myocardial ele...

