Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Comparison of three clustering approaches for detecting novel environmental microbial diversity

View through CrossRef
Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR 2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.
Title: Comparison of three clustering approaches for detecting novel environmental microbial diversity
Description:
Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology.
To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR 2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm.
The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs.
Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions.
Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies.
Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.

Related Results

The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
Optimizing machine learning techniques for genomics clustering
Optimizing machine learning techniques for genomics clustering
Optimisation des techniques d’apprentissage automatique pour le clustering génomique Dans le domaine de la bioinformatique, le clustering est une technique efficace...
Cash‐based approaches in humanitarian emergencies: a systematic review
Cash‐based approaches in humanitarian emergencies: a systematic review
This Campbell systematic review examines the effectiveness, efficiency and implementation of cash transfers in humanitarian settings. The review summarises evidence from five studi...
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
To partition transaction data values, clustering algorithms are used. To analyse the relationships between transactions, similarity measures are utilized. Similarity models based o...

Back to Top