Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Estimated limits of organism-specific training for epitope prediction

View through CrossRef
Abstract Background The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests. Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets. Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms. Unfortunately, for most organisms, large volumes of validated epitope data are not yet available. This article investigates the limits of organism-specific training for epitope prediction. It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance. It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets. Results This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets. The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data. Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets. Conclusions Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.
Title: Estimated limits of organism-specific training for epitope prediction
Description:
Abstract Background The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests.
Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets.
Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms.
Unfortunately, for most organisms, large volumes of validated epitope data are not yet available.
This article investigates the limits of organism-specific training for epitope prediction.
It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance.
It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets.
Results This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets.
The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data.
Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets.
Conclusions Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.

Related Results

VaccineDesigner: A Web-based Tool for Streamlined Multi-epitope Vaccine Design
VaccineDesigner: A Web-based Tool for Streamlined Multi-epitope Vaccine Design
Abstract Background Multi-epitope vaccines have become the preferred strategy for protection against infectious diseases by int...
VaccineDesigner: A Web-Based Tool for Streamlined Multi-Epitope Vaccine Design
VaccineDesigner: A Web-Based Tool for Streamlined Multi-Epitope Vaccine Design
Background: Multi-epitope vaccines have become the preferred strategy for protection against infectious diseases by integrating multiple MHC-restricted T-cell and B-cell epitopes t...
Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine
Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine
Abstract B-cell epitope will be recognized and attached to the surface of receptors in B-lymphocytes to trigger immune response, thus are the vital elements in the f...
Characterization of linear epitope specificity of antibodies potentially contributing to spontaneous clearance of hepatitis C virus
Characterization of linear epitope specificity of antibodies potentially contributing to spontaneous clearance of hepatitis C virus
Background Around 30% of the HCV infected patients can spontaneously clear the virus. Cumulative evidence suggests the role of neutralizing antibodies in such spontaneous resolutio...
Predicting TCR sequences for unseen antigen epitopes using structural and sequence features
Predicting TCR sequences for unseen antigen epitopes using structural and sequence features
Abstract T-cell receptor (TCR) recognition of antigens is fundamental to the adaptive immune response. With the expansion of experimental techniques, a substantial database...
Trooping the (School) Colour
Trooping the (School) Colour
Introduction Throughout the early and mid-twentieth century, cadet training was a feature of many secondary schools and educational establishments across Australia, with countless ...
AI driven approaches in Nanobody Epitope Prediction: Are We There Yet?
AI driven approaches in Nanobody Epitope Prediction: Are We There Yet?
ABSTRACT Nanobodies have emerged as a versatile class of biologics with promising therapeutic applications, driving the need for robust tools to predict their epito...

Back to Top