Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

RAmbler:de novogenome assembly of complex repetitive regions

View through CrossRef
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information. Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions. When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially. However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.e., PacBio HiFi). We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage. Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.e., single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets). Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs. We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.The code for RAmbler is available athttps://github.com/sakshar/rambler.CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.
Title: RAmbler:de novogenome assembly of complex repetitive regions
Description:
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information.
Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions.
When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially.
However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.
In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.
e.
, PacBio HiFi).
We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage.
Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.
e.
, single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.
We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets).
Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs.
We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.
The code for RAmbler is available athttps://github.
com/sakshar/rambler.
CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.

Related Results

Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Loop assembly v2
Loop assembly v2
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly. Loop assembly comprises8receiver plasmids in odd ...
Loop and uLoop assembly v5
Loop and uLoop assembly v5
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...
Loop assembly protocol v1.0 v1
Loop assembly protocol v1.0 v1
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using the Loop assembly system. Loop assembly comprises8receiver plasm...
Loop and uLoop assembly v4
Loop and uLoop assembly v4
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...
Grid‐enabled collaborative virtual assembly environment
Grid‐enabled collaborative virtual assembly environment
PurposeOwing to the numerous part models and massive datasets used in automobile assembly design, virtual assembly software cannot simulate a whole vehicle smoothly in real time. F...
An Assembly Line Multi-Station Assembly Sequence Planning Method Based on Particle Swarm Optimization Algorithm
An Assembly Line Multi-Station Assembly Sequence Planning Method Based on Particle Swarm Optimization Algorithm
<p>Aiming at the problem that the existing assembly sequence planning methods are difficult to meet the multi-station assembly requirements of assembly line, an assembly sequ...
Dominance analysis of competing protein assembly pathways
Dominance analysis of competing protein assembly pathways
Most proteins form complexes consisting of two or more subunits, where complex assembly can proceed via two competing pathways: co-translational assembly of a mature and a nascent ...

Back to Top