Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

10-minimizers: a promising class of constant-space minimizers

View through CrossRef
Abstract Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of size σ , a minimizer is defined by two positive integers k, w and a linear order ρ on k -mers. A sequence is processed by a sliding window algorithm that chooses in each window of length w + k − 1 its minimal k -mer with respect to ρ . A key characteristic of a minimizer is its density, which is the expected frequency of chosen k -mers among all k -mers in a random infinite σ -ary sequence. Minimizers of smaller density are preferred as they produce smaller samples, which lead to reduced runtime and memory usage in downstream applications. Recent studies developed methods to generate minimizers with optimal and near-optimal densities, but they require to explicitly store k -mer ranks in Ω (2 k ) space. While constant-space minimizers exist, and some of them are proven to be asymptotically optimal, no constant-space minimizers was proven to guarantee lower density compared to a random minimizer in the non-asymptotic regime, and many minimizer schemes suffer from long k -mer key-retrieval times due to complex computation. In this paper, we introduce 10-minimizers, which constitute a class of minimizers with promising properties. First, we prove that for every k > 1 and every w ≥ k − 2, a random 10-minimizer has, on expectation, lower density than a random minimizer. This is the first provable guarantee for a class of minimizers in the non-asymptotic regime. Second, we present spacers, which are particular 10-minimizers combining three desirable properties: they are constant-space, low-density, and have small k -mer key-retrieval time. In terms of density, spacers are competitive to the best known constant-space minimizers; in certain ( k, w ) regimes they achieve the lowest density among all known (not necessarily constant-space) minimizers. Notably, we are the first to benchmark constant-space minimizers in the time spent for k -mer key retrieval, which is the most fundamental operation in many minimizers-based methods. Our empirical results show that spacers can retrieve k -mer keys in competitive time (a few seconds per genome-size sequence, which is less than required by random minimizers), for all practical values of k and w . We expect 10-minimizers to improve minimizers-based methods, especially those using large window sizes. We also propose the k -mer key-retrieval benchmark as a standard objective for any new minimizer scheme.
Title: 10-minimizers: a promising class of constant-space minimizers
Description:
Abstract Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis.
Assuming a fixed alphabet of size σ , a minimizer is defined by two positive integers k, w and a linear order ρ on k -mers.
A sequence is processed by a sliding window algorithm that chooses in each window of length w + k − 1 its minimal k -mer with respect to ρ .
A key characteristic of a minimizer is its density, which is the expected frequency of chosen k -mers among all k -mers in a random infinite σ -ary sequence.
Minimizers of smaller density are preferred as they produce smaller samples, which lead to reduced runtime and memory usage in downstream applications.
Recent studies developed methods to generate minimizers with optimal and near-optimal densities, but they require to explicitly store k -mer ranks in Ω (2 k ) space.
While constant-space minimizers exist, and some of them are proven to be asymptotically optimal, no constant-space minimizers was proven to guarantee lower density compared to a random minimizer in the non-asymptotic regime, and many minimizer schemes suffer from long k -mer key-retrieval times due to complex computation.
In this paper, we introduce 10-minimizers, which constitute a class of minimizers with promising properties.
First, we prove that for every k > 1 and every w ≥ k − 2, a random 10-minimizer has, on expectation, lower density than a random minimizer.
This is the first provable guarantee for a class of minimizers in the non-asymptotic regime.
Second, we present spacers, which are particular 10-minimizers combining three desirable properties: they are constant-space, low-density, and have small k -mer key-retrieval time.
In terms of density, spacers are competitive to the best known constant-space minimizers; in certain ( k, w ) regimes they achieve the lowest density among all known (not necessarily constant-space) minimizers.
Notably, we are the first to benchmark constant-space minimizers in the time spent for k -mer key retrieval, which is the most fundamental operation in many minimizers-based methods.
Our empirical results show that spacers can retrieve k -mer keys in competitive time (a few seconds per genome-size sequence, which is less than required by random minimizers), for all practical values of k and w .
We expect 10-minimizers to improve minimizers-based methods, especially those using large window sizes.
We also propose the k -mer key-retrieval benchmark as a standard objective for any new minimizer scheme.

Related Results

SPECIFICATION FOR TESTING AUTOMOTIVE MINIATURE BULBS
SPECIFICATION FOR TESTING AUTOMOTIVE MINIATURE BULBS
<div class="section abstract"> <div class="htmlview paragraph">The procedures contained in this specification cover the laboratory testing of miniature incandescent b...
METALCLAD RIGID AIRSHIP DEVELOPMENT1
METALCLAD RIGID AIRSHIP DEVELOPMENT1
<div class="htmlview paragraph">Several years ago some of the most prominent leaders in automotive industries cooperated to form a purely engineering group that had as its pr...
Fuze Well Mechanical Interface
Fuze Well Mechanical Interface
<div class="section abstract"> <div class="htmlview paragraph">This interface standard applies to fuzes used in airborne weapons that use a 3-Inch Fuze Well. It defin...
Generating minimum-density minimizers
Generating minimum-density minimizers
Abstract Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of siz...
ANALISIS KONFLIK SUMBERDAYA HUTAN DI KAWASAN KONSERVASI
ANALISIS KONFLIK SUMBERDAYA HUTAN DI KAWASAN KONSERVASI
<p class="MsoNormal" style="margin: 6pt 0cm; text-align: justify;"><span class="hps"><em><span style="font-size: 11pt;">This </span></em></sp...
KNOWLEDGE AND PREVENTION OF DEMENTIA AMONG THE ELDERLY
KNOWLEDGE AND PREVENTION OF DEMENTIA AMONG THE ELDERLY
<p class="TableParagraph"><span class="TextRun SCXW51044073 BCX8" lang="ID" xml:lang="ID" data-contrast="auto"><span class="NormalTextRun SCXW51044073 BCX8" data-ccp...
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
Cybersecurity Guidebook for Cyber-Physical Vehicle Systems
<div class="section abstract"> <div class="htmlview paragraph">This recommended practice provides guidance on vehicle Cybersecurity and was created based off of, and ...

Back to Top