Javascript must be enabled to continue!
PanForest: predicting genes in genomes using random forests
View through CrossRef
Abstract
Motivation
The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organization, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.
Results
PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present. Performance statistics output by PanForest reveal how predictable each gene’s presence or absence is, based on the presence or absence of other genes in the genome. Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene. The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.
A pangenome of 12 741 accessory genes in 1000 Escherichia coli genomes was analysed in around 5 h using eight processors. To demonstrate PanForest’s utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug. Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs. We envisage PanForest’s use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology.
Availability and implementation
The software if freely available with a full manual and can be found with at www.github.com/alanbeavan/PanForest DOI: https://doi.org/10.5281/zenodo.17865482.
Oxford University Press (OUP)
Title: PanForest: predicting genes in genomes using random forests
Description:
Abstract
Motivation
The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent.
Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organization, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.
Results
PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present.
Performance statistics output by PanForest reveal how predictable each gene’s presence or absence is, based on the presence or absence of other genes in the genome.
Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene.
The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.
A pangenome of 12 741 accessory genes in 1000 Escherichia coli genomes was analysed in around 5 h using eight processors.
To demonstrate PanForest’s utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug.
Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs.
We envisage PanForest’s use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology.
Availability and implementation
The software if freely available with a full manual and can be found with at www.
github.
com/alanbeavan/PanForest DOI: https://doi.
org/10.
5281/zenodo.
17865482.
Related Results
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Frequency of Common Chromosomal Abnormalities in Patients with Idiopathic Acquired Aplastic Anemia
Objective: To determine the frequency of common chromosomal aberrations in local population idiopathic determine the frequency of common chromosomal aberrations in local population...
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
ABSTRACT
Corynebacterium
is a predominant genus in the skin microbiome, yet its genetic diversity on skin is incompletely chara...
Spa forests in Poland as forests with special legal status – selected issues
Spa forests in Poland as forests with special legal status – selected issues
The subject of the article were spa forests in Poland as forests with special legal status. Due to the lack of a legal definition of this term, the aim of the article was to charac...
ANALISIS EVALUASI EKONOMI SUMBER DAYA ALAM DAN LINGKUNGAN DI KABUPATEN KONAWE SELATAN
ANALISIS EVALUASI EKONOMI SUMBER DAYA ALAM DAN LINGKUNGAN DI KABUPATEN KONAWE SELATAN
The condition of natural resources of forests and agricultural land, especially in South Konawe Regency, is currently quite a concern. South Konawe Regency has a forest area with a...
Robust Random Forests for Genomic Prediction: Challenges and Remedies
Robust Random Forests for Genomic Prediction: Challenges and Remedies
Abstract
Data contamination—from recording errors to extreme outliers—can compromise statistical models by biasing predictions, inflating prediction errors, and, in...
How chromosomal rearrangements shape genomes : a computational and mathematical study
How chromosomal rearrangements shape genomes : a computational and mathematical study
Comment les réarrangements chromosomiques façonnent les génomes : étude par modélisation et simulations
Les origines de la complexité des génomes, ainsi que les dét...
Statistique des comparaisons de génomes complets bactériens
Statistique des comparaisons de génomes complets bactériens
La génomique comparative est l'étude des relations structurales et fonctionnelles entre des génomes appartenant à différentes souches ou espèces. Cette discipline offre ainsi la po...
Natural forests of Ukrainian Carpathians and adjacent areas: geospatial analysis by geomatics tools
Natural forests of Ukrainian Carpathians and adjacent areas: geospatial analysis by geomatics tools
Natural forests have persisted in Ukraine mainly in the Ukrainian Carpathians, most of which are known from previous publications. However, since 2019, the protection status of a s...

