Javascript must be enabled to continue!

Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.

Springer Science and Business Media LLC

Yulan Liang Amin Gharipour Erik Kelemen Arpad Kelemen

2020

Title: Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies

Description:

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases.

Diverse sets of computational tools were developed for omics data reductions and protein selections.

However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples.

Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues.

Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability.

In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy.

This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown).

We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes.

Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy.

Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.

Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes.

For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size.

Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.

Back

Abstract A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...

Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials

Abstract Introduction Cancer patients face a venous thromboembolism (VTE) risk that is up to 50 times higher compared to individuals without cancer. In 2010, direct oral anticoagul...

Cash‐based approaches in humanitarian emergencies: a systematic review

This Campbell systematic review examines the effectiveness, efficiency and implementation of cash transfers in humanitarian settings. The review summarises evidence from five studi...

Diagnostic Rate of the Cancer by BDORT Utilizing the Cancer Slide

Purpose: To make a diagnosis of cancer with BDORT (resonance test), we can choose two methods. One is to use a chemical agent like Integrin α5β1 or Oncogene C-f...

Breast Carcinoma within Fibroadenoma: A Systematic Review

Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...

Advanced Machine Learning Techniques for Prognostic Analysis in Breast Cancer

Aims The aim of this research is mainly to use machine learning methods for forecasting significant characteristics related to breast cancer using the data to f...

Abstract OI-1: OI-1 Decoding breast cancer predisposition genes

Abstract Women with one or more first-degree female relatives with a history of breast cancer have a two-fold increased risk of developing breast cancer. This risk i...

Abstract 1624: Antigen-independent de novo prediction of cancer-associated TCR repertoire

Abstract Cancer-associated T cells play a critical role in mediating immune responses in the anti-tumor immunity. However, due to the complex nature of cancer antige...

Email:
Password:

Email:

Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies

Related Results