Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

RAmbler:de novogenome assembly of complex repetitive regions

View through CrossRef
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information. Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions. When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially. However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.e., PacBio HiFi). We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage. Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.e., single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets). Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs. We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.The code for RAmbler is available athttps://github.com/sakshar/rambler.CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.
Title: RAmbler:de novogenome assembly of complex repetitive regions
Description:
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information.
Despite remarkable algorithmic progress in genome assembly in the last twenty years, modernde novoassemblers still struggle to accurately reconstruct these highly repetitive regions.
When sequenced reads will be long enough to span all repetitive regions, the problem will be solved trivially.
However, even the third generation of sequencing technologies on the market cannot yet produce reads that are sufficiently long (and accurate) to span every repetitive region in large eukaryotic genomes.
In this work, we introduce a novel algorithm called RAmbler to resolve complex repetitive regions based on high-quality long reads (i.
e.
, PacBio HiFi).
We first identify repetitive regions by mapping the HiFi reads to the draft genome assembly and by detecting un-usually high mapping coverage.
Then, (i) we compute thek-mers that are expected to occur only once in the genome (i.
e.
, single copyk-mers, which we callunikmers), (ii) we barcode the HiFi reads based on the presence and the location of their unikmers, (iii) we compute an overlap graph solely based on shared barcodes, (iv) we reconstruct the sequence of the repetitive region by traversing the overlap graph.
We present an extensive set of experiments comparing the performance of RAmbler against Hifiasm, HiCANU and Verkko on synthetic HiFi reads generated over a wide range of repeat lengths, number of repeats, heterozygosity rates and depth of sequencing (over 140 data sets).
Our experimental results indicate that RAmbler outperforms Hifiasm, HiCANU and Verkko on the large majority of the inputs.
We also show that RAmbler can resolve several long tandem repeats inArabidopsis thalianausing real HiFi reads.
The code for RAmbler is available athttps://github.
com/sakshar/rambler.
CCS CONCEPTSApplied computing→Bioinformatics;Computational genomics;Molecular sequence analysis; •Theory of computation→ Graph algorithms analysis.

Related Results

Mining Repetitive Patterns in Multimedia Data
Mining Repetitive Patterns in Multimedia Data
One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern mining (Han, Cheng, Xin, & Yan,...
Development and Applications of the SCARA Robot
Development and Applications of the SCARA Robot
In the 1980s, when the author worked for Seiko Epson Corporation as a wristwatch production engineer, consumer needs had become so diversified that wristwatches had to be assembled...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Digital repetitive control under varying frequency conditions
Digital repetitive control under varying frequency conditions
The tracking/rejection of periodic signals constitutes a wide field of research in the control theory and applications area and Repetitive Control has proven to be an efficient wa...
Loop assembly v2
Loop assembly v2
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly. Loop assembly comprises8receiver plasmids in odd ...
Loop and uLoop assembly v5
Loop and uLoop assembly v5
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...
Loop assembly protocol v1.0 v1
Loop assembly protocol v1.0 v1
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using the Loop assembly system. Loop assembly comprises8receiver plasm...
Loop and uLoop assembly v4
Loop and uLoop assembly v4
This protocol is used for performing Type IIS assembly by either BsaI or SapI-mediated restriction/ligation using Loop assembly with either Loop or uLoop plasmids. Loop assembly ...

Back to Top