Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes

View through CrossRef
Abstract Background: Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads. Methods: We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data. Results: The assessment of LoMA showed high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to the previous study. The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (≥ 100 bp) respectively. Most insertions (~ 80%) were derived from the tandem repeat and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Further, our analysis suggested that short tandem duplications were association with gene expression and transposons. Conclusions: Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions. Our approach contributes to the future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma.
Title: Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Description:
Abstract Background: Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome.
However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads.
Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.
Methods: We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs.
Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer.
We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.
Results: The assessment of LoMA showed high accuracy of CSs (error rate < 0.
3%) compared with raw data (error rate > 8%) and superiority to the previous study.
The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (≥ 100 bp) respectively.
Most insertions (~ 80%) were derived from the tandem repeat and transposable elements.
We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp).
Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.
Conclusions: Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors.
This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions.
Our approach contributes to the future human genome studies.
LoMA is available at our GitHub page: https://github.
com/kolikem/loma.

Related Results

RAmbler:de novogenome assembly of complex repetitive regions
RAmbler:de novogenome assembly of complex repetitive regions
ABSTRACTComplex repetitive regions (also called segmental duplications) in eukaryotic genomes often contain essential functional and regulatory information. Despite remarkable algo...
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
AbstractBackgroundLong-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, ...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Bacterial genome annotation script using BLASTN v2
Bacterial genome annotation script using BLASTN v2
This protocol uses the command line tools provided by the Python package TnAtlas to identify and annotate transposon integration events in genomes. Given a set of sequencing reads...
A systematic comparison of eight new plastome sequences from Ipomoea L
A systematic comparison of eight new plastome sequences from Ipomoea L
Background Ipomoea is the largest genus in the family Convolvulaceae. The species in this genus have been widely used in many fields, such as agriculture, nutrition, and medicine. ...
Pangenome – its Aspect and Prospect
Pangenome – its Aspect and Prospect
Pangenome is a very new discipline showing a collection of unique and variable genomes of any species in one model. This discipline is a combination of three subjects of Biology li...
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
Abstract Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organis...

Back to Top