Javascript must be enabled to continue!
Plant species-specific basecaller improves actual accuracy of nanopore sequencing
View through CrossRef
Abstract
Background
Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads.
Results
Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific).
Conclusions
The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.
Research Square Platform LLC
Title: Plant species-specific basecaller improves actual accuracy of nanopore sequencing
Description:
Abstract
Background
Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies.
A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences.
This is challenging as current basecallers are primarily based on mixtures of model species for training.
Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy.
We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models.
We also evaluate accuracy gains from ONT’s improved flowcells (R10.
4, FLO-PRO112) and sequencing kits (SQK-LSK112).
For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads.
Results
Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.
96% and 94.
15%.
Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.
47% and 96.
18%.
Species-specific basecalling models improved read accuracy, attaining 93.
24% and 95.
16% read accuracies.
R10.
4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.
46% (super-accurate) and 96.
87% (species-specific).
Conclusions
The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species.
Training of single-species and genome-specific basecaller models improves read accuracy.
Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models.
Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.
Related Results
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants
Abstract
Background
Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modification...
238. Direct identification of Bacterial Species with MinION Nanopore Sequencer In Clinical Specimens Suspected of Polybacterial Infection
238. Direct identification of Bacterial Species with MinION Nanopore Sequencer In Clinical Specimens Suspected of Polybacterial Infection
Abstract
Background
Conventional culture tests usually identify only a few bacterial species, which can grow well in the culture...
Nanocall: an open source basecaller for Oxford Nanopore sequencing data
Nanocall: an open source basecaller for Oxford Nanopore sequencing data
Abstract
Motivation
The highly portable Oxford Nanopore MinION sequencer has enabled new applications of genome sequencin...
Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data
Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data
ABSTRACT
Motivation
The highly portable Oxford Nanopore MinlON sequencer has enabled new applications of genome sequencing dire...
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Abstract
16S rRNA amplicon sequencing is a fundamental tool for characterizing prokaryotic microbial communities. While short-read 16S rRNA sequencing is a proven s...
Monitoring airborne pathogens by nanopore sequencing
Monitoring airborne pathogens by nanopore sequencing
Next generation sequencing technologies have revolutionized the field of environmental science. Widely used short-read sequencing enables accurate microbial identification but is o...
A Tear-Based Approach for Rapid Identification of Bacterial Pathogens in Corneal Ulcers Using Nanopore Sequencing
A Tear-Based Approach for Rapid Identification of Bacterial Pathogens in Corneal Ulcers Using Nanopore Sequencing
Abstract
Purpose
Corneal ulcers pose a significant threat to vision, with the need for prompt and precise pathogen identificati...
Diversity of Plant community in Satun Geopark
Diversity of Plant community in Satun Geopark
Background and Objectives: The diversity of species and plant communities varies among the areas. Understanding of species and their habitats is vital on conservation and sustainab...

