Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

lra: the Long Read Aligner for Sequences and Contigs

View through CrossRef
Abstract Motivation It is computationally challenging to detect variation by aligning long reads from single-molecule sequencing (SMS) instruments, or megabase-scale contigs from SMS assemblies. One approach to efficiently align long sequences is sparse dynamic programming (SDP), where exact matches are found between the sequence and the genome, and optimal chains of matches are found representing a rough alignment. Sequence variation is more accurately modeled when alignments are scored with a gap penalty that is a convex function of the gap length. Because previous implementations of SDP used a linear-cost gap function that does not accurately model variation, and implementations of alignment that have a convex gap penalty are either inefficient or use heuristics, we developed a method, lra, that uses SDP with a convex-cost gap penalty. We use lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. Results Across all data types, the runtime of lra is between 52-168% of the state of the art aligner minimap2 when generating SAM alignment, and 9-15% of an alternative method, ngmlr. This alignment approach may be used to provide additional evidence of SV calls in PacBio datasets, and an increase in sensitivity and specificity on ONT data with current SV detection algorithms. The number of calls discovered using pbsv with lra alignments are within 98.3-98.6% of calls made from minimap2 alignments on the same data, and give a nominal 0.2-0.4% increase in F1 score by Truvari analysis. On ONT data with SV called using Sniffles, the number of calls made from lra alignments is 3% greater than minimap2-based calls, and 30% greater than ngmlr based calls, with a 4.6-5.5% increase in Truvari F1 score. When applied to calling variation from de novo assembly contigs, there is a 5.8% increase in SV calls compared to minimap2+paftools, with a 4.3% increase in Truvari F1 score. Availability and implementation Available in bioconda: https://anaconda.org/bioconda/lra and github: https://github.com/ChaissonLab/LRA Contact mchaisso@usc.edu , jingwenr@usc.edu
Title: lra: the Long Read Aligner for Sequences and Contigs
Description:
Abstract Motivation It is computationally challenging to detect variation by aligning long reads from single-molecule sequencing (SMS) instruments, or megabase-scale contigs from SMS assemblies.
One approach to efficiently align long sequences is sparse dynamic programming (SDP), where exact matches are found between the sequence and the genome, and optimal chains of matches are found representing a rough alignment.
Sequence variation is more accurately modeled when alignments are scored with a gap penalty that is a convex function of the gap length.
Because previous implementations of SDP used a linear-cost gap function that does not accurately model variation, and implementations of alignment that have a convex gap penalty are either inefficient or use heuristics, we developed a method, lra, that uses SDP with a convex-cost gap penalty.
We use lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs.
Results Across all data types, the runtime of lra is between 52-168% of the state of the art aligner minimap2 when generating SAM alignment, and 9-15% of an alternative method, ngmlr.
This alignment approach may be used to provide additional evidence of SV calls in PacBio datasets, and an increase in sensitivity and specificity on ONT data with current SV detection algorithms.
The number of calls discovered using pbsv with lra alignments are within 98.
3-98.
6% of calls made from minimap2 alignments on the same data, and give a nominal 0.
2-0.
4% increase in F1 score by Truvari analysis.
On ONT data with SV called using Sniffles, the number of calls made from lra alignments is 3% greater than minimap2-based calls, and 30% greater than ngmlr based calls, with a 4.
6-5.
5% increase in Truvari F1 score.
When applied to calling variation from de novo assembly contigs, there is a 5.
8% increase in SV calls compared to minimap2+paftools, with a 4.
3% increase in Truvari F1 score.
Availability and implementation Available in bioconda: https://anaconda.
org/bioconda/lra and github: https://github.
com/ChaissonLab/LRA Contact mchaisso@usc.
edu , jingwenr@usc.
edu.

Related Results

Risk Factors of Composite Attachment Loss During Orthodontic Clear Aligner Therapy
Risk Factors of Composite Attachment Loss During Orthodontic Clear Aligner Therapy
Abstract Background The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there i...
Risk Factors of Composite Attachment Loss in Orthodontic Patients during Orthodontic Clear Aligner Therapy: A Prospective Study
Risk Factors of Composite Attachment Loss in Orthodontic Patients during Orthodontic Clear Aligner Therapy: A Prospective Study
Background. The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there is a lack of rela...
GW24-e3530 Comparison of transradial (left versus right) coronary angiography in elderly women
GW24-e3530 Comparison of transradial (left versus right) coronary angiography in elderly women
Objectives Marked vascular tortuosity is more common in right radial approach (RRA) than that in left radial approach (LRA) and may lead to coronary procedure fai...
Effect of de novo transcriptome assembly on transcript quantification
Effect of de novo transcriptome assembly on transcript quantification
Abstract Background Correct quantification of transcript expression is essential to understand the functio...
Initiation of controller medication in newly diagnosed asthma patients: Impact on economic resource utilization
Initiation of controller medication in newly diagnosed asthma patients: Impact on economic resource utilization
Introduction: To better understand the economic burden of asthma, we compared asthma-related direct costs among adults prescribed controller regimens. ...
P1389Periodicity and Spatial Stability of Complex Propagation Patterns in Atrial Fibrillation
P1389Periodicity and Spatial Stability of Complex Propagation Patterns in Atrial Fibrillation
Abstract Introduction Non-contact charge density mapping identifies complex wavefront propagation including localised rotational...

Back to Top