Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

novoSplice: A gene, splice and quality aware RNA-seq aligner

View through CrossRef
novoSplice is an RNA-seq aligner which utilizes genes sequences as searching windows. The reduced indexing approach allows for an efficient, on the fly, hash-based indexing of genes, and it enables novoSplice to perform complicated splice junction identification, count features while performing reads alignment, and differentially penalize alignments depending on the gene bio-type. Reads bases qualities are utilized in the read alignment process and the refinement of the alignment score. To reduce reference bias, we implemented support for ambiguous (IUPAC) bases during indexing. In mapping single/paired-end RNA-seq reads, novoSplice implements a two-pass approach where a seed-and-vote algorithm is invoked to find a set of candidate genes. Then for each hit, a fast splice-aware chaining algorithm is called to obtain the best reads sequence alignment. In the event where the chaining process failed, a modified splice-aware Needleman–Wunsch algorithm is utilized to produce the final alignment. We benchmarked novoSplice against other state-of-the-art RNA-seq aligners according to SimBA [1] . SimBA benchmarking focuses on reads mapping to genome, splice junction alignment, single nucleotide variants and indels, and fusion based on simulated human RNA-seq (101bp paired-end) for normal and somatic mutation. We extended the scope to include additional RNA-seq aligners and simulated RNA-seq reads (100bp and 150bp paired-end) from multiple species (Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae). Currently, we are performing benchmarking on five aligners; novoSplice, novoAlign, STAR [2] , HISAT2 [3] and GSNAP [4] . In our preliminary benchmarking, and using SimBA somatic and normal 101bp datasets, novoSplice v0.4.1 shows consistent leading performance in both somatic and normal datasets. Extending SimBA study to include more data-sets from different species and read lengths would allow us to further investigate novoSplice performance in comparison to other aligners within the limits of simulated reads. novoSplice is in its beta phase and it is currently undergoing further performance tuning exercise for paired-end/single-end short reads routines. Support for long reads and single-cell RNA-seq and additional features; eg. detection of gene fusions, back-splicing, and trans-splicing will be available in the near future. novoSplice will be available at http://www.novocraft.com/products/novosplice for both Linux and macOS platforms.
Title: novoSplice: A gene, splice and quality aware RNA-seq aligner
Description:
novoSplice is an RNA-seq aligner which utilizes genes sequences as searching windows.
The reduced indexing approach allows for an efficient, on the fly, hash-based indexing of genes, and it enables novoSplice to perform complicated splice junction identification, count features while performing reads alignment, and differentially penalize alignments depending on the gene bio-type.
Reads bases qualities are utilized in the read alignment process and the refinement of the alignment score.
To reduce reference bias, we implemented support for ambiguous (IUPAC) bases during indexing.
In mapping single/paired-end RNA-seq reads, novoSplice implements a two-pass approach where a seed-and-vote algorithm is invoked to find a set of candidate genes.
Then for each hit, a fast splice-aware chaining algorithm is called to obtain the best reads sequence alignment.
In the event where the chaining process failed, a modified splice-aware Needleman–Wunsch algorithm is utilized to produce the final alignment.
We benchmarked novoSplice against other state-of-the-art RNA-seq aligners according to SimBA [1] .
SimBA benchmarking focuses on reads mapping to genome, splice junction alignment, single nucleotide variants and indels, and fusion based on simulated human RNA-seq (101bp paired-end) for normal and somatic mutation.
We extended the scope to include additional RNA-seq aligners and simulated RNA-seq reads (100bp and 150bp paired-end) from multiple species (Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae).
Currently, we are performing benchmarking on five aligners; novoSplice, novoAlign, STAR [2] , HISAT2 [3] and GSNAP [4] .
In our preliminary benchmarking, and using SimBA somatic and normal 101bp datasets, novoSplice v0.
4.
1 shows consistent leading performance in both somatic and normal datasets.
Extending SimBA study to include more data-sets from different species and read lengths would allow us to further investigate novoSplice performance in comparison to other aligners within the limits of simulated reads.
novoSplice is in its beta phase and it is currently undergoing further performance tuning exercise for paired-end/single-end short reads routines.
Support for long reads and single-cell RNA-seq and additional features; eg.
detection of gene fusions, back-splicing, and trans-splicing will be available in the near future.
novoSplice will be available at http://www.
novocraft.
com/products/novosplice for both Linux and macOS platforms.

Related Results

Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract Introduction Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
The strength of the HIV-1 3' splice sites affects Rev function
The strength of the HIV-1 3' splice sites affects Rev function
Abstract Background The HIV-1 Rev protein is a key component in the early to late switch in HIV-1 splicing from early intronless (e.g. tat, rev) ...
Risk Factors of Composite Attachment Loss During Orthodontic Clear Aligner Therapy
Risk Factors of Composite Attachment Loss During Orthodontic Clear Aligner Therapy
Abstract Background The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there i...
Risk Factors of Composite Attachment Loss in Orthodontic Patients during Orthodontic Clear Aligner Therapy: A Prospective Study
Risk Factors of Composite Attachment Loss in Orthodontic Patients during Orthodontic Clear Aligner Therapy: A Prospective Study
Background. The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there is a lack of rela...
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
ABSTRACTCurrently, DNA and RNA are used separately to capture different types of gene mutations. DNA is commonly used for the detection of SNVs, indels and CNVs; RNA is used for an...
Detecting RNA–RNA interactome
Detecting RNA–RNA interactome
AbstractThe last decade has seen a robust increase in various types of novel RNA molecules and their complexity in gene regulation. RNA molecules play a critical role in cellular e...
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...

Back to Top