Javascript must be enabled to continue!
Generating minimum-density minimizers
View through CrossRef
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of size
σ
, a minimizer is defined by two positive integers
k, w
and a linear order
ρ
on
k
-mers. A sequence is processed by a sliding window algorithm that chooses in each window of length
w
+
k
− 1 its minimal
k
-mer with respect to
ρ
. A key characteristic of a minimizer is its density, which is the expected frequency of chosen
k
-mers among all
k
-mers in a random infinite
σ
-ary sequence. Minimizers of smaller density are preferred as they produce smaller samples, which lead to reduced runtime and memory usage in downstream applications. While the hardness of finding a minimizer of minimum density for given input parameters (
σ, k, w
) is unknown, it has a huge search space of (
σ
k
)! and there is no known algorithm apart from a trivial brute-force search.
In this paper, we tackle the minimum density problem for minimizers. We first formulate this problem as an ILP of size
Θ
(
wσ
w
+
k
), which has worst-case solution time that is doubly-exponential in (
k
+
w
) under standard complexity assumptions. Our experiments show that an ILP solver terminates with an optimal solution only for very small
k
and
w
. We then present our main method, called OptMini, which computes an optimal minimizer in
time and thus is capable of processing large
w
values. In experiments, OptMini works much faster than the runtime predicts due to several additional tricks shrinking the search space without harming optimality. We use OptMini to compute minimum-density minimizers for (
σ, k
) ∈ {(2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (4, 2)} and
w
∈ [2, 3
σ
k
], with the exception of certain
w
-ranges for
k
= 6 and the single case of
k
= 5,
w
= 2. Finally, we derive conclusions and insights regarding the density values as a function of
w
, patterns in optimal minimizer orders, and the relation between minimum-size universal hitting sets and minimum-density minimizers.
Title: Generating minimum-density minimizers
Description:
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis.
Assuming a fixed alphabet of size
σ
, a minimizer is defined by two positive integers
k, w
and a linear order
ρ
on
k
-mers.
A sequence is processed by a sliding window algorithm that chooses in each window of length
w
+
k
− 1 its minimal
k
-mer with respect to
ρ
.
A key characteristic of a minimizer is its density, which is the expected frequency of chosen
k
-mers among all
k
-mers in a random infinite
σ
-ary sequence.
Minimizers of smaller density are preferred as they produce smaller samples, which lead to reduced runtime and memory usage in downstream applications.
While the hardness of finding a minimizer of minimum density for given input parameters (
σ, k, w
) is unknown, it has a huge search space of (
σ
k
)! and there is no known algorithm apart from a trivial brute-force search.
In this paper, we tackle the minimum density problem for minimizers.
We first formulate this problem as an ILP of size
Θ
(
wσ
w
+
k
), which has worst-case solution time that is doubly-exponential in (
k
+
w
) under standard complexity assumptions.
Our experiments show that an ILP solver terminates with an optimal solution only for very small
k
and
w
.
We then present our main method, called OptMini, which computes an optimal minimizer in
time and thus is capable of processing large
w
values.
In experiments, OptMini works much faster than the runtime predicts due to several additional tricks shrinking the search space without harming optimality.
We use OptMini to compute minimum-density minimizers for (
σ, k
) ∈ {(2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (4, 2)} and
w
∈ [2, 3
σ
k
], with the exception of certain
w
-ranges for
k
= 6 and the single case of
k
= 5,
w
= 2.
Finally, we derive conclusions and insights regarding the density values as a function of
w
, patterns in optimal minimizer orders, and the relation between minimum-size universal hitting sets and minimum-density minimizers.
Related Results
10-minimizers: a promising class of constant-space minimizers
10-minimizers: a promising class of constant-space minimizers
Abstract
Minimizers are sampling schemes which are ubiquitous in almost any high-throughput sequencing analysis. Assuming a fixed alphabet of siz...
Linking White‐Tailed Deer Density, Nutrition, and Vegetation in a Stochastic Environment
Linking White‐Tailed Deer Density, Nutrition, and Vegetation in a Stochastic Environment
ABSTRACT
Density‐dependent behavior underpins white‐tailed deer (
Odocoileus virginianus
) theory and...
Fundamental Concepts and Methodology for the Analysis of Animal Population Dynamics, with Particular Reference to Univoltine Species
Fundamental Concepts and Methodology for the Analysis of Animal Population Dynamics, with Particular Reference to Univoltine Species
This paper presents some concepts and methodology essential for the analysis of population dynamics of univoltine species. Simple stochastic difference equations, comprised of endo...
Minimum Performance Standards of Tillage Implements
Minimum Performance Standards of Tillage Implements
This article focuses on formulation of minimum performance standards (MPS) for tillage machinery such as rotavator, disc harrow and cultivator. The required minimum performance sta...
A Smörgåsbord of Skyrmions
A Smörgåsbord of Skyrmions
Abstract
We study static solutions of the standard Skyrme model with a pion mass. Using approximately 105 pseudo-random initial configurations made of single S...
ON SOME NONLOCAL VARIATIONAL PROBLEMS
ON SOME NONLOCAL VARIATIONAL PROBLEMS
We study uniqueness and non uniqueness of minimizers of functionals involving nonlocal quantities. We give also conditions which lead to a lack of minimizers and we show how minimi...
ENVIRONMENT DENSITY OF A GIANT RADIO STRUCTURE FOR GALAXIES AND QUASARS WITH STEEP RADIO SPECTRA
ENVIRONMENT DENSITY OF A GIANT RADIO STRUCTURE FOR GALAXIES AND QUASARS WITH STEEP RADIO SPECTRA
Purpose: Estimate of the environment density of giant (with the linear size of about megaparsec) radio structures for galaxies and quasars with steep low-frequency spectra taken fr...
Dynamics of carbon pool in Oak dominated community forests of District Tehri Garhwal, Uttarakhand, India
Dynamics of carbon pool in Oak dominated community forests of District Tehri Garhwal, Uttarakhand, India
The present study was carried out community forest of Tehri Garhwal Uttarakhand. The amount of growing stock volume density in the study region were ranges between 28.21 m3/ha and ...

