Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Species-specific basecallers improve actual accuracy of nanopore sequencing in plants

View through CrossRef
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. Results Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). Conclusions The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.
Title: Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Description:
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies.
A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences.
This is challenging as current basecallers are primarily based on mixtures of model species for training.
Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy.
We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models.
We also evaluate accuracy gains from ONT’s improved flowcells (R10.
4, FLO-PRO112) and sequencing kits (SQK-LSK112).
For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads.
Results Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.
96% and 94.
15%.
Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.
47% and 96.
18%.
Species-specific basecalling models improved read accuracy, attaining 93.
24% and 95.
16% read accuracies.
R10.
4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.
46% (super-accurate) and 96.
87% (species-specific).
Conclusions The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species.
Training of single-species and genome-specific basecaller models improves read accuracy.
Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models.
Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.

Related Results

Plant species-specific basecaller improves actual accuracy of nanopore sequencing
Plant species-specific basecaller improves actual accuracy of nanopore sequencing
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequen...
Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing
Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing
Abstract Plasmid construction is central to life science research, and sequence verification is arguably its costliest step. Long-read sequencing has emerged as a competitor to San...
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
Impacts of man-made structures on marine biodiversity and species status - native & non-native species
<p>Coastal environments are exposed to anthropogenic activities such as frequent marine traffic and restructuring, i.e., addition, removal or replacing with man-made structur...
Nanopore Technology and Its Applications in Gene Sequencing
Nanopore Technology and Its Applications in Gene Sequencing
In recent years, nanopore technology has become increasingly important in the field of life science and biomedical research. By embedding a nano-scale hole in a thin membrane and m...
Introduction
Introduction
Nanopore electrochemistry refers to the promising measurement science based on elaborate pore structures that offer a well-defined geometric confined space to adopt and characteriz...
Direct oligonucleotide sequencing with nanopores v1
Direct oligonucleotide sequencing with nanopores v1
Third-generation DNA sequencing has enabled users to sequence long, unamplified DNA fragments with minimal steps. Direct sequencing of ssDNA or RNA gives valuable insights like bas...
Direct oligonucleotide sequencing with nanopores v1
Direct oligonucleotide sequencing with nanopores v1
Third-generation DNA sequencing has enabled users to sequence long, unamplified DNA fragments with minimal steps. Direct sequencing of ssDNA or RNA gives valuable insights like bas...

Back to Top