Javascript must be enabled to continue!
Network science inspires novel tree shape statistics
View through CrossRef
1
Abstract
The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at
http://github.com/Leonardini/treeCentrality
.
Title: Network science inspires novel tree shape statistics
Description:
1
Abstract
The shape of phylogenetic trees can be used to gain evolutionary insights.
A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree.
In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems.
Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks.
Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes.
We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations.
These new statistics can be computed in linear time and scale well to describe the shapes of large trees.
We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes.
Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics.
We describe their distributions and prove some basic results about their extreme values in a tree.
We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features.
All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at
http://github.
com/Leonardini/treeCentrality
.
Related Results
Predictors of Statistics Anxiety Among Graduate Students in Saudi Arabia
Predictors of Statistics Anxiety Among Graduate Students in Saudi Arabia
Problem The problem addressed in this study is the anxiety experienced by graduate students toward statistics courses, which often causes students to delay taking statistics cours...
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
Inter-specific variations in tree stem methane and nitrous oxide exchanges in a tropical rainforest
<p>Tropical forests are the most productive terrestrial ecosystems, global centres of biodiversity and important participants in the global carbon and water cycles. T...
Network Automation
Network Automation
Purpose: The article "Network Automation in the Contemporary Economy" explores the concepts and methods of effective network management. The application stack, Jinja template engin...
The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
The Sensitivity Feature Analysis for Tree Species Based on Image Statistical Properties
While the statistical properties of images are vital in forestry engineering, the usefulness of these properties in various forestry tasks may vary, and certain image properties mi...
Nonsplit Neighbourhood Tree Domination Number In Connected Graphs
Nonsplit Neighbourhood Tree Domination Number In Connected Graphs
: Let G = (V, E) be a connected graph. A subset D of V is called a dominating set of G if N[D] = V. The minimum cardinality of a dominating set of G is called the domination number...
Spatial patterns of argan-tree influence on soil quality of intertree areas in open woodlands of South Morocco
Spatial patterns of argan-tree influence on soil quality of intertree areas in open woodlands of South Morocco
Abstract. The endemic argan tree (Argania spinosa) populations in South Morocco are highly degraded due to overbrowsing, illegal firewood extraction and the expansion of intensive ...
Rebuilding Tree Cover in Deforested Cocoa Landscapes in Côte d’Ivoire: Factors Affecting the Choice of Species Planted
Rebuilding Tree Cover in Deforested Cocoa Landscapes in Côte d’Ivoire: Factors Affecting the Choice of Species Planted
Intensive cocoa production in Côte d’Ivoire, the world’s leading cocoa producer, has grown at the expense of forest cover. To reverse this trend, the country has adopted a “zero de...
Empirical Performance of Tree-based Inference of Phylogenetic Networks
Empirical Performance of Tree-based Inference of Phylogenetic Networks
AbstractPhylogenetic networks extend the phylogenetic tree structure and allow for modeling vertical and horizontal evolution in a single framework. Statistical inference of phylog...

