Javascript must be enabled to continue!
Estimated limits of organism-specific training for epitope prediction
View through CrossRef
Abstract
Background
The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests. Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets. Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms. Unfortunately, for most organisms, large volumes of validated epitope data are not yet available. This article investigates the limits of organism-specific training for epitope prediction. It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance. It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets.
Results
This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets. The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data. Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets.
Conclusions
Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.
Title: Estimated limits of organism-specific training for epitope prediction
Description:
Abstract
Background
The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests.
Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets.
Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms.
Unfortunately, for most organisms, large volumes of validated epitope data are not yet available.
This article investigates the limits of organism-specific training for epitope prediction.
It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance.
It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets.
Results
This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets.
The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data.
Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets.
Conclusions
Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.
Related Results
AI driven approaches in Nanobody Epitope Prediction: Are We There Yet?
AI driven approaches in Nanobody Epitope Prediction: Are We There Yet?
ABSTRACT
Nanobodies have emerged as a versatile class of biologics with promising therapeutic applications, driving the need for robust tools to predict their epito...
Über Phosphorsäuren niederer Oxydationszahl. I. Über Oxydation und Hydrolyse der \documentclass{article}\pagestyle{empty}\begin{document}$\mathop {\rm P}\limits^{\rm 2} {\rm - }\mathop {\rm P}\limits^{\rm 4}$\end{document}‐Säure und der \documentclass{art
Über Phosphorsäuren niederer Oxydationszahl. I. Über Oxydation und Hydrolyse der \documentclass{article}\pagestyle{empty}\begin{document}$\mathop {\rm P}\limits^{\rm 2} {\rm - }\mathop {\rm P}\limits^{\rm 4}$\end{document}‐Säure und der \documentclass{art
AbstractDie \documentclass{article}\pagestyle{empty}\begin{document}$\mathop {\rm P}\limits^{\rm 4} {\rm - }\mathop {\rm P}\limits^{\rm 4}$\end{document}‐Säure ist sogar gegen 80pr...
Measles Vaccines Designed for Enhanced CD8+ T Cell Activation
Measles Vaccines Designed for Enhanced CD8+ T Cell Activation
Priming and activation of CD8+ T cell responses is crucial to achieve anti-viral and anti-tumor immunity. Live attenuated measles vaccine strains have been used successfully for im...
Design of an epitope-based peptide vaccine againstCryptococcus neoformans
Design of an epitope-based peptide vaccine againstCryptococcus neoformans
AbstractIntroductionThis study aimed to design an immunogenic epitope for Cryptococcus neoformans the etiological agent of cryptococcosis using in silico simulations, for epitope p...
B-cell epitope prediction through a graph model
B-cell epitope prediction through a graph model
Abstract
Background
Prediction of B-cell epitopes from antigens is useful to understand the immune basis of antibody-antigen recognition, and is ...
Immunoinformatics Design and In Vivo Immunogenicity Evaluation of a Conserved CTL Multi-Epitope Vaccine Targeting HPV16 E5, E6, and E7 Proteins
Immunoinformatics Design and In Vivo Immunogenicity Evaluation of a Conserved CTL Multi-Epitope Vaccine Targeting HPV16 E5, E6, and E7 Proteins
Human papillomavirus type 16 (HPV16) infection is responsible for more than 50% of global cervical cancer cases. The development of a vaccine based on cytotoxic T-lymphocyte (CTL) ...
Training Evaluation
Training Evaluation
Training evaluation is the systematic collection of data to better manage training programs and training systems. To be effective, evaluation should answer two questions: How did I...
Gene function finding through cross-organism ensemble learning
Gene function finding through cross-organism ensemble learning
Abstract
Background
Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of comp...

