Javascript must be enabled to continue!
Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies
View through CrossRef
Abstract
Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.
Title: Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies
Description:
Abstract
Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases.
Diverse sets of computational tools were developed for omics data reductions and protein selections.
However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples.
Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues.
Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability.
In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy.
This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown).
We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes.
Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy.
Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.
e.
Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes.
For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size.
Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.
Related Results
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Abstract
Introduction
Cancer patients face a venous thromboembolism (VTE) risk that is up to 50 times higher compared to individuals without cancer. In 2010, direct oral anticoagul...
Cash‐based approaches in humanitarian emergencies: a systematic review
Cash‐based approaches in humanitarian emergencies: a systematic review
This Campbell systematic review examines the effectiveness, efficiency and implementation of cash transfers in humanitarian settings. The review summarises evidence from five studi...
Diagnostic Rate of the Cancer by BDORT Utilizing the Cancer Slide
Diagnostic Rate of the Cancer by BDORT Utilizing the Cancer Slide
Purpose:
To make a diagnosis of cancer with BDORT (resonance test), we can choose two methods. One is to use a chemical agent like Integrin α5β1 or Oncogene C-f...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Advanced Machine Learning Techniques for Prognostic Analysis in Breast Cancer
Advanced Machine Learning Techniques for Prognostic Analysis in Breast Cancer
Aims
The aim of this research is mainly to use machine learning methods for forecasting significant characteristics related to breast cancer using the data to f...
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract
Women with one or more first-degree female relatives with a history of breast cancer have a two-fold increased risk of developing breast cancer. This risk i...
Abstract 1624: Antigen-independent de novo prediction of cancer-associated TCR repertoire
Abstract 1624: Antigen-independent de novo prediction of cancer-associated TCR repertoire
Abstract
Cancer-associated T cells play a critical role in mediating immune responses in the anti-tumor immunity. However, due to the complex nature of cancer antige...

