Javascript must be enabled to continue!
Ultrafast and ultralarge multiple sequence alignments using TWILIGHT
View through CrossRef
Abstract
Motivation
Multiple sequence alignment (MSA) is a fundamental operation in bioinformatics, yet existing MSA tools are struggling to keep up with the speed and volume of incoming data. This is because the runtimes and memory requirements of current MSA tools become untenable when processing large numbers of long input sequences, and they also fail to fully harness the parallelism provided by modern CPUs and GPUs.
Results
We present Tall and Wide Alignments at High Throughput (TWILIGHT), a novel MSA tool optimized for speed, accuracy, scalability, and memory constraints, with both CPU and GPU support. TWILIGHT incorporates innovative parallelization and memory-efficiency strategies that enable it to build ultralarge alignments at high speed even on memory-constrained devices. On challenging datasets, TWILIGHT outperformed all other tools in speed and accuracy. It scaled beyond the limits of existing tools and performed an alignment of 1 million RNASim sequences within 30 min while utilizing <16 GB of memory. TWILIGHT is the first tool to align over 8 million publicly available SARS-CoV-2 sequences, setting a new standard for large-scale genomic alignment and data analysis.
Availability and implementation
TWILIGHT’s code is freely available under the MIT license at https://github.com/TurakhiaLab/TWILIGHT. The test datasets and experimental results, including our alignment of 8 million SARS-CoV-2 sequences, are available at https://zenodo.org/records/14722035.
Title: Ultrafast and ultralarge multiple sequence alignments using TWILIGHT
Description:
Abstract
Motivation
Multiple sequence alignment (MSA) is a fundamental operation in bioinformatics, yet existing MSA tools are struggling to keep up with the speed and volume of incoming data.
This is because the runtimes and memory requirements of current MSA tools become untenable when processing large numbers of long input sequences, and they also fail to fully harness the parallelism provided by modern CPUs and GPUs.
Results
We present Tall and Wide Alignments at High Throughput (TWILIGHT), a novel MSA tool optimized for speed, accuracy, scalability, and memory constraints, with both CPU and GPU support.
TWILIGHT incorporates innovative parallelization and memory-efficiency strategies that enable it to build ultralarge alignments at high speed even on memory-constrained devices.
On challenging datasets, TWILIGHT outperformed all other tools in speed and accuracy.
It scaled beyond the limits of existing tools and performed an alignment of 1 million RNASim sequences within 30 min while utilizing <16 GB of memory.
TWILIGHT is the first tool to align over 8 million publicly available SARS-CoV-2 sequences, setting a new standard for large-scale genomic alignment and data analysis.
Availability and implementation
TWILIGHT’s code is freely available under the MIT license at https://github.
com/TurakhiaLab/TWILIGHT.
The test datasets and experimental results, including our alignment of 8 million SARS-CoV-2 sequences, are available at https://zenodo.
org/records/14722035.
Related Results
COFFEE: an objective function for multiple sequence alignments.
COFFEE: an objective function for multiple sequence alignments.
Abstract
MOTIVATION: In order to increase the accuracy of multiple sequence alignments, we designed a new strategy for optimizing multiple sequence alignments by gen...
Multiple Alignments of Data Objects and Generalized Center Star Algorithm
Multiple Alignments of Data Objects and Generalized Center Star Algorithm
Multiple alignments of strings have been extensively studied as an effective tool to study string-type data such as DNA. In this paper, we generalize the notion of multiple alignme...
Ancestral sequence alignment under optimal conditions
Ancestral sequence alignment under optimal conditions
Abstract
Background
Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment app...
Protein Embedding based Alignment
Protein Embedding based Alignment
Despite of the many progresses with alignment algorithms, aligning
divergent protein sequences including those sharing less than 20-35%
pairwise identity (so called “twilight zone”...
Truck stability on different types of horizontal curves combined with vertical alignments
Truck stability on different types of horizontal curves combined with vertical alignments
The combination of horizontal curves with vertical alignments is commonly used in different classifications of highways; either on highway mainstream or on highway interchange ramp...
Truck stability on different types of horizontal curves combined with vertical alignments
Truck stability on different types of horizontal curves combined with vertical alignments
The combination of horizontal curves with vertical alignments is commonly used in different classifications of highways; either on highway mainstream or on highway interchange ramp...
Influence of alignment uncertainty on homology and phylogenetic modeling
Influence of alignment uncertainty on homology and phylogenetic modeling
Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to est...
Evaluation of driver visual demand at different design speeds on complex two-dimensional rural highway alignments
Evaluation of driver visual demand at different design speeds on complex two-dimensional rural highway alignments
Road crashes are a major cause of loss of human life, property and money throughout the world. One of the reasons behind these crashes is the interaction between drivers and road a...

