Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Columba: Fast Approximate Pattern Matching with Optimized Search Schemes

View through CrossRef
AbstractAligning sequencing reads to reference genomes is a fundamental task in bioinformatics. Aligners can be classified as lossy or lossless: lossy aligners prioritize speed by reporting only one or a few high-scoring alignments, whereas lossless aligners output all optimal alignments, ensuring completeness and sensitivity. This paper introduces Columba, a high-performance lossless aligner tailored for Illumina sequencing data. Columba processes single or paired-end reads in FASTQ format and outputs alignments in SAM format. By utilizing advanced search schemes and bit-parallel alignment techniques, Columba achieves exceptional speed. Columba is available in two variants. The first is based on the bidirectional FM-index. The second, Columba RLC, employs run-length compression using a bidirectional move structure, significantly reducing memory usage for large, repetitive datasets like pan-genomes. Through extensive benchmarking, Columba outperforms existing lossless aligners in speed, particularly at higher error rates. Tests on the human genome and bacterial and human pan-genome datasets demonstrate Columba’s robustness and efficiency. We integrated Columba into the OptiType HLA genotyping pipeline, where it substantially reduced computational time while maintaining accuracy. These results position Columba as a versatile, state-of-the-art tool for high-sensitivity genomic analyses.
Title: Columba: Fast Approximate Pattern Matching with Optimized Search Schemes
Description:
AbstractAligning sequencing reads to reference genomes is a fundamental task in bioinformatics.
Aligners can be classified as lossy or lossless: lossy aligners prioritize speed by reporting only one or a few high-scoring alignments, whereas lossless aligners output all optimal alignments, ensuring completeness and sensitivity.
This paper introduces Columba, a high-performance lossless aligner tailored for Illumina sequencing data.
Columba processes single or paired-end reads in FASTQ format and outputs alignments in SAM format.
By utilizing advanced search schemes and bit-parallel alignment techniques, Columba achieves exceptional speed.
Columba is available in two variants.
The first is based on the bidirectional FM-index.
The second, Columba RLC, employs run-length compression using a bidirectional move structure, significantly reducing memory usage for large, repetitive datasets like pan-genomes.
Through extensive benchmarking, Columba outperforms existing lossless aligners in speed, particularly at higher error rates.
Tests on the human genome and bacterial and human pan-genome datasets demonstrate Columba’s robustness and efficiency.
We integrated Columba into the OptiType HLA genotyping pipeline, where it substantially reduced computational time while maintaining accuracy.
These results position Columba as a versatile, state-of-the-art tool for high-sensitivity genomic analyses.

Related Results

NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matching
NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matching
Pattern matching can be used to calculate the support of patterns, and is a key issue in sequential pattern mining (or sequence pattern mining). Nonoverlapping pattern matching mea...
A Fast Pattern Matching Algorithm Based on Middle Characters of Pattern String
A Fast Pattern Matching Algorithm Based on Middle Characters of Pattern String
String pattern matching is one of the important string operation. At present, the pattern matching algorithm of strings mainly includes BF algorithm, KMP algorithm, and improved KM...
2021 Census to Census Coverage Survey Matching Results.
2021 Census to Census Coverage Survey Matching Results.
The 2021 England and Wales Census was matched to the Census Coverage Survey (CCS). This was an essential requisite for estimating undercount in the Census. To ensure outputs could ...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Using Metadata to Understand Search Behavior in Digital Libraries
Using Metadata to Understand Search Behavior in Digital Libraries
This thesis explores how search log analysis can be used to gain a deeper understanding of online search behavior in curated collections by leveraging the metadata. For this, we us...
CIE S 014-1:2006 Colorimetry - Part 1: CIE Standard Colorimetric Observers
CIE S 014-1:2006 Colorimetry - Part 1: CIE Standard Colorimetric Observers
Superseded by Colorimetry - Part 1: CIE Standard Colorimetric Observers, 2nd Edition-\n--\n-Joint ISO/CIE Standard-\n--\n-ISO 11664-1:2007(E)/CIE S 014-1/E:2006-\n--\n-This CIE Sta...
Search engines and their search strategies: the effective use by Indian academics
Search engines and their search strategies: the effective use by Indian academics
Purpose – The purpose of this paper is to examine the use of various search engines and meta search engines by Indian academics for retrieving information on the we...
A text pattern‐matching tool based on Parsing Expression Grammars
A text pattern‐matching tool based on Parsing Expression Grammars
AbstractCurrent text pattern‐matching tools are based on regular expressions. However, pure regular expressions have proven too weak a formalism for the task: many interesting patt...

Back to Top