Javascript must be enabled to continue!
Clustering de Novo by Gene of Long Reads from Transcriptomics Data
View through CrossRef
Abstract
Long-read sequencing currently provides sequences of several thousand base pairs. This allows to obtain complete transcripts, which offers an un-precedented vision of the cellular transcriptome.
However the literature is lacking tools to cluster such data
de novo
, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads.
Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This
de novo
approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution is both to propose a new algorithm adapted to clustering of reads by gene and a practical and free access tool that permits to scale the complete processing of eukaryotic transcriptomes.
We sequenced a mouse RNA sample using the MinION device, this dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate its is better-suited for transcriptomics long reads. When a reference is available thus mapping possible, we show that it stands as an alternative method that predicts complementary clusters.
Title: Clustering de Novo by Gene of Long Reads from Transcriptomics Data
Description:
Abstract
Long-read sequencing currently provides sequences of several thousand base pairs.
This allows to obtain complete transcripts, which offers an un-precedented vision of the cellular transcriptome.
However the literature is lacking tools to cluster such data
de novo
, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads.
Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene.
This
de novo
approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping.
Our contribution is both to propose a new algorithm adapted to clustering of reads by gene and a practical and free access tool that permits to scale the complete processing of eukaryotic transcriptomes.
We sequenced a mouse RNA sample using the MinION device, this dataset is used to compare our solution to other algorithms used in the context of biological clustering.
We demonstrate its is better-suited for transcriptomics long reads.
When a reference is available thus mapping possible, we show that it stands as an alternative method that predicts complementary clusters.
Related Results
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background:
Clustering is one of the most important data mining methods. The k-means
(c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the nonālinear manifold. Recently, linear discriminant analysis (LDA) based vario...
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms
Abstract
Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organis...
12481 Evaluating The Device Handling And Preference Assessment Questionnaire For Growth Hormone Deficiency: Results From Content Validation Interviews
12481 Evaluating The Device Handling And Preference Assessment Questionnaire For Growth Hormone Deficiency: Results From Content Validation Interviews
Abstract
Disclosure: J. Neergaard: Employee; Self; Novo Nordisk. S. Akhtar: Employee; Self; Novo Nordisk. Stock Owner; Self; Novo Nordisk. B. Berg: Employee; Self; N...
A COMPARATIVE ANALYSIS OF K-MEANS AND HIERARCHICAL CLUSTERING
A COMPARATIVE ANALYSIS OF K-MEANS AND HIERARCHICAL CLUSTERING
Clustering is the process of arranging comparable data elements into groups. One of the most frequent data mining analytical techniques is clustering analysis; the clustering algor...
Machine Learning Methodologies for Clustering Gene Expression Data in Cancer
Machine Learning Methodologies for Clustering Gene Expression Data in Cancer
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism. Extracting the hidden patterns in gene expressi...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

