Javascript must be enabled to continue!
Critical assessment of pan-genomics of metagenome-assembled genomes
View through CrossRef
AbstractBackgroundLarge scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years. As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow. However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination, due to mis-assembly and mis-binning. Here, we conducted a critical assessment of including MAGs in pan-genome analysis, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs.ResultsWe found that incompleteness led to more significant core gene loss than fragmentation. Contamination had little effect on core genome size but had major influence on accessory genomes. The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes. Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%. The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees.ConclusionsWe conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs to alleviate the accuracy loss. Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.
Title: Critical assessment of pan-genomics of metagenome-assembled genomes
Description:
AbstractBackgroundLarge scale metagenome assembly and binning to generate metagenome-assembled genomes (MAGs) has become possible in the past five years.
As a result, millions of MAGs have been produced and increasingly included in pan-genomics workflow.
However, pan-genome analyses of MAGs may suffer from the known issues with MAGs: fragmentation, incompleteness, and contamination, due to mis-assembly and mis-binning.
Here, we conducted a critical assessment of including MAGs in pan-genome analysis, by comparing pan-genome analysis results of complete bacterial genomes and simulated MAGs.
ResultsWe found that incompleteness led to more significant core gene loss than fragmentation.
Contamination had little effect on core genome size but had major influence on accessory genomes.
The core gene loss remained when using different pan-genome analysis tools and when using a mixture of MAGs and complete genomes.
Importantly, the core gene loss was partially alleviated by lowering the core gene threshold and using gene prediction algorithms that consider fragmented genes, but to a less degree when incompleteness was higher than 5%.
The core gene loss also led to incorrect pan-genome functional predictions and inaccurate phylogenetic trees.
ConclusionsWe conclude that lowering core gene threshold and predicting genes in metagenome mode (as Anvi’o does with Prodigal) are necessary in pan-genome analysis of MAGs to alleviate the accuracy loss.
Better quality control of MAGs and development of new pan-genome analysis tools specifically designed for MAGs are needed in future studies.
Related Results
Genomics and society: four scenarios for 2015
Genomics and society: four scenarios for 2015
This paper develops four alternative scenarios depicting possible futures for genomics applications within a broader social context. The scenarios integrate forecasts for future ge...
Pan-genome analysis of six Paracoccus type strain genomes reveal lifestyle traits
Pan-genome analysis of six Paracoccus type strain genomes reveal lifestyle traits
The genus Paracoccus capable of inhabiting a variety of different ecological niches both, marine and terrestrial, is globally distributed. In addition, Paracoccus is taxonomically,...
COBRA improves the quality of viral genomes assembled from metagenomes
COBRA improves the quality of viral genomes assembled from metagenomes
AbstractMicrobial and viral diversity, distribution, and ecological impacts are often studied using metagenome-assembled sequences, but genome incompleteness hampers comprehensive ...
A systematic comparison of eight new plastome sequences from Ipomoea L
A systematic comparison of eight new plastome sequences from Ipomoea L
Background
Ipomoea is the largest genus in the family Convolvulaceae. The species in this genus have been widely used in many fields, such as agriculture, nutrition, and medicine. ...
METAGENOMICS CURRENT RESEARCH, APPLICATION AND COMPUTATIONAL ANALYSIS
METAGENOMICS CURRENT RESEARCH, APPLICATION AND COMPUTATIONAL ANALYSIS
Metagenomics is the combination of genomics branch and meta that means huge set of genomes from different organisms. Metagenomics is also called as environmental genomics or commun...
Machine Learning-Based Comparative Analysis of Pan-Cancer and Pan-Normal Tissues Identifies Pan-Cancer Tissue-Enriched circRNAs Related to Cancer Mutations as Potential Exosomal Biomarkers
Machine Learning-Based Comparative Analysis of Pan-Cancer and Pan-Normal Tissues Identifies Pan-Cancer Tissue-Enriched circRNAs Related to Cancer Mutations as Potential Exosomal Biomarkers
A growing body of evidence has shown that circular RNA (circRNA) is a promising exosomal cancer biomarker candidate. However, global circRNA alterations in cancer and the underlyin...
High-quality metagenome assembly from nanopore reads with nanoMDBG
High-quality metagenome assembly from nanopore reads with nanoMDBG
AbstractThird-generation long-read sequencing technologies, have been shown to significantly enhance the quality of metagenome assemblies. The results obtained using the highly acc...
Network Analysis for Estimating Standardization Trends in Genomics
Network Analysis for Estimating Standardization Trends in Genomics
Abstract
With the development of biotechnology in genomics, such as droplet digital PCR, sequencing device, gene analysis software, an increase in the clinical application ...

