Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

An Empirical Demonstration of Unsupervised Machine Learning in Species Delimitation

View through CrossRef
Abstract One major challenge to delimiting species with genetic data is successfully differentiating species divergences from population structure, with some current methods biased towards overestimating species numbers. Many fields of science are now utilizing machine learning (ML) approaches, and in systematics and evolutionary biology, supervised ML algorithms have recently been incorporated to infer species boundaries. However, these methods require the creation of training data with associated labels. Unsupervised ML, on the other hand, uses the inherent structure in data and hence does not require any user-specified training labels, thus providing a more objective approach to species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three unsupervised ML approaches, specifically random forests, variational autoencoders, and t-distributed stochastic neighbor embedding, for species delimitation utilizing a short-range endemic harvestman taxon (Laniatores, Metanonychus ). First, we combine mitochondrial data with examination of male genitalic morphology to identify a priori species hypotheses. Then we use single nucleotide polymorphism data derived from sequence capture of ultraconserved elements (UCEs) to test the efficacy of unsupervised ML algorithms in successfully identifying a priori species, comparing results to commonly used genetic approaches. Finally, we use two validation methods to assess a priori species hypotheses using UCE data. We find that unsupervised ML approaches successfully cluster samples according to species level divergences and not to high levels of population structure, while standard model-based validation methods over-split species, in some instances suggesting that all sampled individuals are distinct species. Moreover, unsupervised ML approaches offer the benefits of better data visualization in two-dimensional space and the ability to accommodate various data types. We argue that ML methods may be better suited for species delimitation relative to currently used model-based validation methods, and that species delimitation in a truly integrative framework provides more robust final species hypotheses relative to separating delimitation into distinct “discovery” and “validation” phases. Unsupervised ML is a powerful analytical approach that can be incorporated into many aspects of systematic biology, including species delimitation. Based on results of our empirical dataset, we make several taxonomic changes including description of a new species.
Title: An Empirical Demonstration of Unsupervised Machine Learning in Species Delimitation
Description:
Abstract One major challenge to delimiting species with genetic data is successfully differentiating species divergences from population structure, with some current methods biased towards overestimating species numbers.
Many fields of science are now utilizing machine learning (ML) approaches, and in systematics and evolutionary biology, supervised ML algorithms have recently been incorporated to infer species boundaries.
However, these methods require the creation of training data with associated labels.
Unsupervised ML, on the other hand, uses the inherent structure in data and hence does not require any user-specified training labels, thus providing a more objective approach to species delimitation.
In the context of integrative taxonomy, we demonstrate the utility of three unsupervised ML approaches, specifically random forests, variational autoencoders, and t-distributed stochastic neighbor embedding, for species delimitation utilizing a short-range endemic harvestman taxon (Laniatores, Metanonychus ).
First, we combine mitochondrial data with examination of male genitalic morphology to identify a priori species hypotheses.
Then we use single nucleotide polymorphism data derived from sequence capture of ultraconserved elements (UCEs) to test the efficacy of unsupervised ML algorithms in successfully identifying a priori species, comparing results to commonly used genetic approaches.
Finally, we use two validation methods to assess a priori species hypotheses using UCE data.
We find that unsupervised ML approaches successfully cluster samples according to species level divergences and not to high levels of population structure, while standard model-based validation methods over-split species, in some instances suggesting that all sampled individuals are distinct species.
Moreover, unsupervised ML approaches offer the benefits of better data visualization in two-dimensional space and the ability to accommodate various data types.
We argue that ML methods may be better suited for species delimitation relative to currently used model-based validation methods, and that species delimitation in a truly integrative framework provides more robust final species hypotheses relative to separating delimitation into distinct “discovery” and “validation” phases.
Unsupervised ML is a powerful analytical approach that can be incorporated into many aspects of systematic biology, including species delimitation.
Based on results of our empirical dataset, we make several taxonomic changes including description of a new species.

Related Results

Relevansi Delimitasi Perbatasan Maritim Dengan Faktor Lingkungan
Relevansi Delimitasi Perbatasan Maritim Dengan Faktor Lingkungan
Introductioan: This article discusses the relevance of maritime border delimitation with environmental factors that affect the determination of delimitation.Purposes of the Researc...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
Generally, the health conditions of rotating machinery are complicated and changeable. Meanwhile, its fault labeled information is mostly unknown. Therefore, it is man-sized to aut...
Species Delimitation
Species Delimitation
Species delimitation is the process of determining whether a group of sampled individuals belong to the same species or to different species. The criteria used to delimit species d...
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
<p>Coastal environments are exposed to anthropogenic activities such as frequent marine traffic and restructuring, i.e., addition, removal or replacing with man-made structur...

Back to Top