Javascript must be enabled to continue!

novoSplice: A gene, splice and quality aware RNA-seq aligner

novoSplice is an RNA-seq aligner which utilizes genes sequences as searching windows. The reduced indexing approach allows for an efficient, on the fly, hash-based indexing of genes, and it enables novoSplice to perform complicated splice junction identification, count features while performing reads alignment, and differentially penalize alignments depending on the gene bio-type. Reads bases qualities are utilized in the read alignment process and the refinement of the alignment score. To reduce reference bias, we implemented support for ambiguous (IUPAC) bases during indexing. In mapping single/paired-end RNA-seq reads, novoSplice implements a two-pass approach where a seed-and-vote algorithm is invoked to find a set of candidate genes. Then for each hit, a fast splice-aware chaining algorithm is called to obtain the best reads sequence alignment. In the event where the chaining process failed, a modified splice-aware Needleman–Wunsch algorithm is utilized to produce the final alignment. We benchmarked novoSplice against other state-of-the-art RNA-seq aligners according to SimBA [1] . SimBA benchmarking focuses on reads mapping to genome, splice junction alignment, single nucleotide variants and indels, and fusion based on simulated human RNA-seq (101bp paired-end) for normal and somatic mutation. We extended the scope to include additional RNA-seq aligners and simulated RNA-seq reads (100bp and 150bp paired-end) from multiple species (Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae). Currently, we are performing benchmarking on five aligners; novoSplice, novoAlign, STAR [2] , HISAT2 [3] and GSNAP [4] . In our preliminary benchmarking, and using SimBA somatic and normal 101bp datasets, novoSplice v0.4.1 shows consistent leading performance in both somatic and normal datasets. Extending SimBA study to include more data-sets from different species and read lengths would allow us to further investigate novoSplice performance in comparison to other aligners within the limits of simulated reads. novoSplice is in its beta phase and it is currently undergoing further performance tuning exercise for paired-end/single-end short reads routines. Support for long reads and single-cell RNA-seq and additional features; eg. detection of gene fusions, back-splicing, and trans-splicing will be available in the near future. novoSplice will be available at http://www.novocraft.com/products/novosplice for both Linux and macOS platforms.

F1000 Research Ltd

Fadel Berakdar Su Wei Chong Akzam Saidin Colin Hercus

2025

Title: novoSplice: A gene, splice and quality aware RNA-seq aligner

Description:

novoSplice is an RNA-seq aligner which utilizes genes sequences as searching windows.

The reduced indexing approach allows for an efficient, on the fly, hash-based indexing of genes, and it enables novoSplice to perform complicated splice junction identification, count features while performing reads alignment, and differentially penalize alignments depending on the gene bio-type.

Reads bases qualities are utilized in the read alignment process and the refinement of the alignment score.

To reduce reference bias, we implemented support for ambiguous (IUPAC) bases during indexing.

In mapping single/paired-end RNA-seq reads, novoSplice implements a two-pass approach where a seed-and-vote algorithm is invoked to find a set of candidate genes.

Then for each hit, a fast splice-aware chaining algorithm is called to obtain the best reads sequence alignment.

In the event where the chaining process failed, a modified splice-aware Needleman–Wunsch algorithm is utilized to produce the final alignment.

We benchmarked novoSplice against other state-of-the-art RNA-seq aligners according to SimBA [1] .

SimBA benchmarking focuses on reads mapping to genome, splice junction alignment, single nucleotide variants and indels, and fusion based on simulated human RNA-seq (101bp paired-end) for normal and somatic mutation.

We extended the scope to include additional RNA-seq aligners and simulated RNA-seq reads (100bp and 150bp paired-end) from multiple species (Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae).

Currently, we are performing benchmarking on five aligners; novoSplice, novoAlign, STAR [2] , HISAT2 [3] and GSNAP [4] .

In our preliminary benchmarking, and using SimBA somatic and normal 101bp datasets, novoSplice v0.

1 shows consistent leading performance in both somatic and normal datasets.

Extending SimBA study to include more data-sets from different species and read lengths would allow us to further investigate novoSplice performance in comparison to other aligners within the limits of simulated reads.

novoSplice is in its beta phase and it is currently undergoing further performance tuning exercise for paired-end/single-end short reads routines.

Support for long reads and single-cell RNA-seq and additional features; eg.

detection of gene fusions, back-splicing, and trans-splicing will be available in the near future.

novoSplice will be available at http://www.

novocraft.

com/products/novosplice for both Linux and macOS platforms.

Back

Abstract Introduction Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...

The strength of the HIV-1 3' splice sites affects Rev function

Abstract Background The HIV-1 Rev protein is a key component in the early to late switch in HIV-1 splicing from early intronless (e.g. tat, rev) ...

Risk Factors of Composite Attachment Loss During Orthodontic Clear Aligner Therapy

Abstract Background The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there i...

Risk Factors of Composite Attachment Loss in Orthodontic Patients during Orthodontic Clear Aligner Therapy: A Prospective Study

Background. The composite attachment loss during orthodontic clear aligner therapy is an adverse event that commonly happens in our daily practice. However, there is a lack of rela...

Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC

ABSTRACTCurrently, DNA and RNA are used separately to capture different types of gene mutations. DNA is commonly used for the detection of SNVs, indels and CNVs; RNA is used for an...

Detecting RNA–RNA interactome

AbstractThe last decade has seen a robust increase in various types of novel RNA molecules and their complexity in gene regulation. RNA molecules play a critical role in cellular e...

Accuracy Evaluation of Indirect Bonding Techniques for Clear Aligner Attachments Using 3D-Printed Models: An In Silico and Physical Model-Based Study

An inaccurate bonding procedure of the attachments related to clear aligner systems could influence the predictability of tooth movement The aim of this study was to compare the po...

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data

AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...

Email:
Password:

Email:

novoSplice: A gene, splice and quality aware RNA-seq aligner

Related Results