Javascript must be enabled to continue!

Estimated limits of organism-specific training for epitope prediction

Abstract Background The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests. Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets. Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms. Unfortunately, for most organisms, large volumes of validated epitope data are not yet available. This article investigates the limits of organism-specific training for epitope prediction. It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance. It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets. Results This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets. The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data. Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets. Conclusions Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.

openRxiv

Jodie Ashford Felipe Campelo

2021

Title: Estimated limits of organism-specific training for epitope prediction

Description:

Abstract Background The identification of linear B-cell epitopes remains an important task in the development of vaccines, therapeutic antibodies and several diagnostic tests.

Machine learning predictors are trained to flag potential epitope candidates for experimental validation and currently, most predictors are trained as generalist models using large, heterogeneous data sets.

Recently, organism-specific training has been shown to improve prediction performance for data-rich organisms.

Unfortunately, for most organisms, large volumes of validated epitope data are not yet available.

This article investigates the limits of organism-specific training for epitope prediction.

It explores the validity of organism-specific training for data-poor organisms by examining how the size of the training data set affects prediction performance.

It also compares the performance of organism-specific training under simulated data-poor conditions to that of models trained using traditional large heterogeneous and hybrid data sets.

Results This work shows how models trained on small organism-specific data sets can outperform similar models trained on (potentially much larger) heterogeneous and mixed data sets.

The results reported indicate that as few as 20 labelled peptides from a given pathogen can be sufficient to generate models that outperform widely-used predictors from the literature, which are trained on heterogeneous data.

Models trained using more than about 100 to 150 organism-specific peptides perform consistently better than most generalist models across a wide variety of performance measures, and in some cases can even approach the performance of organism-specific models trained on considerably larger data sets.

Conclusions Organism-specific training improves linear B-cell epitope prediction performance even in situations when only small training sets are available, which opens new possibilities for the development of bespoke, high-performance predictive models when studying data-poor organisms such as emerging or neglected pathogens.

Back

Abstract Background Multi-epitope vaccines have become the preferred strategy for protection against infectious diseases by int...

VaccineDesigner: A Web-Based Tool for Streamlined Multi-Epitope Vaccine Design

Background: Multi-epitope vaccines have become the preferred strategy for protection against infectious diseases by integrating multiple MHC-restricted T-cell and B-cell epitopes t...

Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine

Abstract B-cell epitope will be recognized and attached to the surface of receptors in B-lymphocytes to trigger immune response, thus are the vital elements in the f...

Characterization of linear epitope specificity of antibodies potentially contributing to spontaneous clearance of hepatitis C virus

Background Around 30% of the HCV infected patients can spontaneously clear the virus. Cumulative evidence suggests the role of neutralizing antibodies in such spontaneous resolutio...

Predicting TCR sequences for unseen antigen epitopes using structural and sequence features

Abstract T-cell receptor (TCR) recognition of antigens is fundamental to the adaptive immune response. With the expansion of experimental techniques, a substantial database...

CD4+ T-cell epitope prediction by combined analysis of antigen conformational flexibility and peptide-MHCII binding affinity

Abstract Antigen processing in the class II MHC pathway depends on conventional proteolytic enzymes, potentially acting on antigens in native-lik...

AI driven approaches in Nanobody Epitope Prediction: Are We There Yet?

ABSTRACT Nanobodies have emerged as a versatile class of biologics with promising therapeutic applications, driving the need for robust tools to predict their epito...

Trooping the (School) Colour

Introduction Throughout the early and mid-twentieth century, cadet training was a feature of many secondary schools and educational establishments across Australia, with countless ...

Email:
Password:

Email:

Estimated limits of organism-specific training for epitope prediction

Related Results