Javascript must be enabled to continue!
RAmbler:de novogenome assembly of complex repetitive regions
View through CrossRef
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information. Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions. When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially. However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.e., PacBio HiFi). We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage. Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.e., single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets). Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs. We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.The code for RAmbler is available athttps://github.com/sakshar/rambler.CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.
Title: RAmbler:de novogenome assembly of complex repetitive regions
Description:
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information.
Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions.
When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially.
However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.
In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.
e.
, PacBio HiFi).
We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage.
Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.
e.
, single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.
We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets).
Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs.
We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.
The code for RAmbler is available athttps://github.
com/sakshar/rambler.
CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.
Related Results
Mining Repetitive Patterns in Multimedia Data
Mining Repetitive Patterns in Multimedia Data
One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern mining (Han, Cheng, Xin, & Yan,...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract
Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Loop and uLoop assembly v4
Loop and uLoop assembly v4
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...
Loop assembly v2
Loop assembly v2
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly. Loop assembly comprises8receiver plasmids in odd ...
Loop and uLoop assembly v5
Loop and uLoop assembly v5
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...
Loop assembly protocol v1.0 v1
Loop assembly protocol v1.0 v1
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using the Loop assembly system. Loop assembly comprises8receiver plasm...
Grid‐enabled collaborative virtual assembly environment
Grid‐enabled collaborative virtual assembly environment
PurposeOwing to the numerous part models and massive datasets used in automobile assembly design, virtual assembly software cannot simulate a whole vehicle smoothly in real time. F...
Materials for Recusant History in “ The Rambler”
Materials for Recusant History in “ The Rambler”
The Rambler, one of the leading English Catholic magazines of the nineteenth century, is best known as the literary organ of the Liberal Catholic movement. Founded in 1848 by a gro...

