Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Scalable Species Tree Inference with External Constraints

View through CrossRef
Abstract Species tree inference under the multi-species coalescent (MSC) model is a basic step in biological discovery. Despite the developments in recent years of methods that are proven statistically consistent and that have high accuracy, large datasets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one method–ASTRAL-J, a recent development in the ASTRAL family of methods–is able to use this information. Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree given partial knowledge of the species tree in the form of a non-binary unrooted constraint tree.. We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multi-species coalescent model subject to this constraint. Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J. An analysis of the Avian Phylogenomics project dataset with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively). Availability FASTRAL-J and NJst-J are available in open source form at https://github.com/RuneBlaze/FASTRAL-constrained and https://github.com/RuneBlaze/NJst-constrained . Locations of the datasets used in this study and detailed commands needed to reproduce the study are provided in the supplementary materials at http://tandy.cs.illinois.edu/baqiao-suppl.pdf .
Title: Scalable Species Tree Inference with External Constraints
Description:
Abstract Species tree inference under the multi-species coalescent (MSC) model is a basic step in biological discovery.
Despite the developments in recent years of methods that are proven statistically consistent and that have high accuracy, large datasets create computational challenges.
Although there is generally some information available about the species trees that could be used to speed up the estimation, only one method–ASTRAL-J, a recent development in the ASTRAL family of methods–is able to use this information.
Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree given partial knowledge of the species tree in the form of a non-binary unrooted constraint tree.
We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multi-species coalescent model subject to this constraint.
Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J.
An analysis of the Avian Phylogenomics project dataset with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively).
Availability FASTRAL-J and NJst-J are available in open source form at https://github.
com/RuneBlaze/FASTRAL-constrained and https://github.
com/RuneBlaze/NJst-constrained .
Locations of the datasets used in this study and detailed commands needed to reproduce the study are provided in the supplementary materials at http://tandy.
cs.
illinois.
edu/baqiao-suppl.
pdf .

Related Results

The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
While the statistical properties of images are vital in forestry engineering, the usefulness of these properties in various forestry tasks may vary, and certain image properties mi...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
<p>Tropical forests are the most productive terrestrial ecosystems, global centres of biodiversity and important participants in the global carbon and water cycles. T...
Rebuilding Tree Cover in Deforested Cocoa Landscapes in Côte d’Ivoire: Factors Affecting the Choice of Species Planted
Rebuilding Tree Cover in Deforested Cocoa Landscapes in Côte d’Ivoire: Factors Affecting the Choice of Species Planted
Intensive cocoa production in Côte d’Ivoire, the world’s leading cocoa producer, has grown at the expense of forest cover. To reverse this trend, the country has adopted a “zero de...
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
<p>Coastal environments are exposed to anthropogenic activities such as frequent marine traffic and restructuring, i.e., addition, removal or replacing with man-made structur...
Empirical Performance of Tree-based Inference of Phylogenetic Networks
Empirical Performance of Tree-based Inference of Phylogenetic Networks
AbstractPhylogenetic networks extend the phylogenetic tree structure and allow for modeling vertical and horizontal evolution in a single framework. Statistical inference of phylog...
DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition
DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition
AbstractSpecies tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors in...

Back to Top