Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Pheniqs 2.0: accurate, high performance Bayesian decoding and confidence estimation for combinatorial barcode indexing

View through CrossRef
AbstractBackgroundSystems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin. Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes. The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes. The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme. The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall. Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation.ResultsWe developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs. Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields. The product of posteriors for multiple independent barcodes provides an overall confidence score for each read. Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of >11 billion reads in 1h15m using <50 megabytes of memory. Pheniqs has been in production use for seven years in our genomics core facility.ConclusionsWe introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods. Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models. Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy. An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools. Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines.
Title: Pheniqs 2.0: accurate, high performance Bayesian decoding and confidence estimation for combinatorial barcode indexing
Description:
AbstractBackgroundSystems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin.
Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes.
The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes.
The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme.
The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall.
Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation.
ResultsWe developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs.
Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields.
The product of posteriors for multiple independent barcodes provides an overall confidence score for each read.
Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of >11 billion reads in 1h15m using <50 megabytes of memory.
Pheniqs has been in production use for seven years in our genomics core facility.
ConclusionsWe introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods.
Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models.
Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy.
An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools.
Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines.

Related Results

Pheniqs: Fast and flexible quality-aware sequence demultiplexing
Pheniqs: Fast and flexible quality-aware sequence demultiplexing
1AbstractMotivationOutput from high throughput sequencing instruments often exceeds what is necessary to assay a single sample. To better utilize this capacity, multiple samples ar...
Improving Decodability of Polar Codes by Adding Noise
Improving Decodability of Polar Codes by Adding Noise
This paper presents an online perturbed and directed neural-evolutionary (Online-PDNE) decoding algorithm for polar codes, in which the perturbation noise and online directed neuro...
Sample-efficient Optimization Using Neural Networks
Sample-efficient Optimization Using Neural Networks
<p>The solution to many science and engineering problems includes identifying the minimum or maximum of an unknown continuous function whose evaluation inflicts non-negligibl...
Figs S1-S9
Figs S1-S9
Fig. S1. Consensus phylogram (50 % majority rule) resulting from a Bayesian analysis of the ITS sequence alignment of sequences generated in this study and reference sequences from...
Optimized Generalized LDPC Convolutional Codes
Optimized Generalized LDPC Convolutional Codes
In this paper, some optimized encoding and decoding schemes are proposed for the generalized LDPC convolutional codes (GLDPC–CCs). In terms of the encoding scheme, a flexible dopin...
Bayesian estimation of the measurement of interactions in epidemiological studies
Bayesian estimation of the measurement of interactions in epidemiological studies
Background Interaction identification is important in epidemiological studies and can be detected by including a product term in the model. However, as Rothman noted...
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...

Back to Top