Javascript must be enabled to continue!
Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
View through CrossRef
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with
AUC
,
F-score
, and
B. Acc.
scores in the ranges of 0.91–0.96, 0.81–0.88, and 80.1–86.4%, respectively, have the closest predictive performance to the state-of-the-art model (
AUC
0.96,
F-score
0.80 and
B. Acc.
~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (
AUC
0.77–0.82,
F-score
0.53–0.69, and
B. Acc.
60.0–68.5%) at
p
< 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (
AUC
of 0.97,
F-score
> 0.89, and
B. Acc.
> 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (
AUC
> 0.94,
F-score
> 0.87, and
B. Acc.
> 86.0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Springer Science and Business Media LLC
Title: Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors
Description:
Abstract
Background
In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes.
This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS.
Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1.
Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors.
In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms.
Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used.
Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models.
Results
Among the 8 models evaluated in the “stratified” 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with
AUC
,
F-score
, and
B.
Acc.
scores in the ranges of 0.
91–0.
96, 0.
81–0.
88, and 80.
1–86.
4%, respectively, have the closest predictive performance to the state-of-the-art model (
AUC
0.
96,
F-score
0.
80 and
B.
Acc.
~ 80.
0%).
Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (
AUC
0.
77–0.
82,
F-score
0.
53–0.
69, and
B.
Acc.
60.
0–68.
5%) at
p
< 0.
05.
On the other hand, logistic regression, and multi-layer perceptron classifier (
AUC
of 0.
97,
F-score
> 0.
89, and
B.
Acc.
> 90.
0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (
AUC
> 0.
94,
F-score
> 0.
87, and
B.
Acc.
> 86.
0%).
Conclusions
Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables.
In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
Related Results
Prediction of HIV-1 Protease Cleavage Site from Octapeptide Sequence Information using Selected Classifiers and Hybrid Descriptors
Prediction of HIV-1 Protease Cleavage Site from Octapeptide Sequence Information using Selected Classifiers and Hybrid Descriptors
Abstract
Background: In most parts of the world, especially in underdeveloped countries, Acquired Immunodeficiency Syndrome (AIDS) still remains a major cause of death, dis...
Capítulo 6 – HIV-AIDS, como tratar, o que fazer e o que não fazer durante o tratamento?
Capítulo 6 – HIV-AIDS, como tratar, o que fazer e o que não fazer durante o tratamento?
A infecção pelo vírus do HIV pode ocorrer de diversas maneiras, tendo sua principal forma a via sexual por meio do sexo desprotegido. O vírus do HIV fica em um período de incubação...
Laboratory-based Evaluation of Wondfo HIV1/2 Rapid Test Kits in the Gambia, December 2020
Laboratory-based Evaluation of Wondfo HIV1/2 Rapid Test Kits in the Gambia, December 2020
Background: HIV rapid diagnosis in The Gambia is mainly done using Determine HIV-1/2 and First Response HIV 1.2.0 or SD Bioline HIV-1/2 3.0 for screening and sero-typing of HIV res...
Impact of HIV/AIDS scale-up on non-HIV priority services in Nyanza Province, Kenya
Impact of HIV/AIDS scale-up on non-HIV priority services in Nyanza Province, Kenya
Background: The HIV pandemic has attracted unprecedented scale-up in resources to curb its escalation and manage those afflicted. Although evidence from developing countries sugges...
How to find simple and accurate rules for viral protease cleavage specificities
How to find simple and accurate rules for viral protease cleavage specificities
BACKGROUND:Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this know...
Stigma Kills
Stigma Kills
Stigma due to an HIV diagnosis is a well-known phenomenon and is a major barrier to accessing care.1Over the last forty years, HIV has been transformed from a fatal disease to a ma...
The relationship between HIV-related stigma and HIV self-management among men who have sex with men: The chain mediating role of social support and self-efficacy
The relationship between HIV-related stigma and HIV self-management among men who have sex with men: The chain mediating role of social support and self-efficacy
HIV infection becomes a manageable disease, and self-management is one of the key indicators of achieving optimal health outcomes. Men who have sex with men (MSM) living with HIV f...
CD4+ T cell count and HIV-1 viral load dynamics positively impacted by H. pylori infection in HIV-positive patients regardless of ART status in a high-burden setting
CD4+ T cell count and HIV-1 viral load dynamics positively impacted by H. pylori infection in HIV-positive patients regardless of ART status in a high-burden setting
Abstract
Background
There is a widespread co-infection of HIV and Helicobacter pylori (H. pylori) globally, particularly in developing countries, an...

