Javascript must be enabled to continue!
Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
View through CrossRef
Abstract
Background
Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.
Methods
We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.
Results
The assessment of LoMA showed high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to the previous study. The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (ζ 100 bp) respectively. Most insertions (∼80%) were derived from the tandem repeat and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.
Conclusions
Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions. Our approach contributes to the future human genome studies. LoMA is available at our GitHub page:
https://github.com/kolikem/loma
.
Title: Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Description:
Abstract
Background
Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome.
However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads.
Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.
Methods
We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs.
Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer.
We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.
Results
The assessment of LoMA showed high accuracy of CSs (error rate < 0.
3%) compared with raw data (error rate > 8%) and superiority to the previous study.
The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (ζ 100 bp) respectively.
Most insertions (∼80%) were derived from the tandem repeat and transposable elements.
We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp).
Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.
Conclusions
Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors.
This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions.
Our approach contributes to the future human genome studies.
LoMA is available at our GitHub page:
https://github.
com/kolikem/loma
.
Related Results
RAmbler:de novogenome assembly of complex repetitive regions
RAmbler:de novogenome assembly of complex repetitive regions
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information. Despite remarkable algo...
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
ABSTRACT
Corynebacterium
is a predominant genus in the skin microbiome, yet its genetic diversity on skin is incompletely chara...
Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Abstract
Background:
Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genom...
Mining Repetitive Patterns in Multimedia Data
Mining Repetitive Patterns in Multimedia Data
One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern mining (Han, Cheng, Xin, & Yan,...
Umap and Bismap: quantifying genome and methylome mappability
Umap and Bismap: quantifying genome and methylome mappability
Abstract
Motivation
Short-read sequencing enables assessment of genetic and biochemical traits of individu...
THE CHRONOLOGY OF SEQUENCING OF COMPLETE PLANT GENOMES
THE CHRONOLOGY OF SEQUENCING OF COMPLETE PLANT GENOMES
The beginning of the era of sequencing complete nuclear genomes of higher plants coincided with the beginning of the new millennium, and over the past quarter century, great progre...
Whole Genome Resequencing and 1000 Genomes Project
Whole Genome Resequencing and 1000 Genomes Project
Abstract
The recent advances in sequencing technologies have enabled the whole human genome to be sequenced within weeks. To date, several human...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract
Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...

