Javascript must be enabled to continue!
DeEPsnap: human essential gene prediction by integrating multi-omics data
View through CrossRef
Abstract
Essential genes are necessary for the survival or reproduction of a living organism. The prediction and analysis of gene essentiality can advance our understanding of basic life and human diseases, and further boost the development of new drugs. Wet lab methods for identifying cell essential genes are often costly, time-consuming, and laborious. As a complement, computational methods have been proposed to predict essential genes by integrating multiple biological data sources. Most of these methods are evaluated on model organisms. However, prediction methods for human essential genes are still limited and the relationship between human gene essentiality and different biological information still needs to be explored. In addition, exploring suitable deep learning techniques to overcome the limitations of traditional machine learning methods and improve prediction accuracy is also important and interesting. We propose a snapshot ensemble deep neural network method, DeEPsnap, to predict human essential genes. DeEPsnap integrates sequence features derived from DNA and protein sequence data with features extracted or learned from multiple types of functional data, such as gene ontology, protein complex, protein domain, and protein-protein interaction network. More than 200 features from these biological data are extracted/learned which are integrated together to train a series of cost-sensitive deep neural networks by utilizing multiple deep learning techniques. The proposed snapshot mechanism enables us to train multiple models without increasing extra training effort and cost. The experimental results of 10-fold cross-validation show that DeEPsnap can accurately predict human gene essentiality with an average AUROC (Area Under the Receiver Operating Characteristic curve) of 96.1%, the average AUPRC (Area under the Precision-Recall curve) of 93.82%, the average accuracy of 92.21%, and the average F1 measure about 80.62%. In addition, the comparison of experimental results shows that DeEPsnap outperforms several popular traditional machine learning models and deep learning models. We have demonstrated that the proposed method, DeEPsnap, is effective for predicting human essential genes.
Title: DeEPsnap: human essential gene prediction by integrating multi-omics data
Description:
Abstract
Essential genes are necessary for the survival or reproduction of a living organism.
The prediction and analysis of gene essentiality can advance our understanding of basic life and human diseases, and further boost the development of new drugs.
Wet lab methods for identifying cell essential genes are often costly, time-consuming, and laborious.
As a complement, computational methods have been proposed to predict essential genes by integrating multiple biological data sources.
Most of these methods are evaluated on model organisms.
However, prediction methods for human essential genes are still limited and the relationship between human gene essentiality and different biological information still needs to be explored.
In addition, exploring suitable deep learning techniques to overcome the limitations of traditional machine learning methods and improve prediction accuracy is also important and interesting.
We propose a snapshot ensemble deep neural network method, DeEPsnap, to predict human essential genes.
DeEPsnap integrates sequence features derived from DNA and protein sequence data with features extracted or learned from multiple types of functional data, such as gene ontology, protein complex, protein domain, and protein-protein interaction network.
More than 200 features from these biological data are extracted/learned which are integrated together to train a series of cost-sensitive deep neural networks by utilizing multiple deep learning techniques.
The proposed snapshot mechanism enables us to train multiple models without increasing extra training effort and cost.
The experimental results of 10-fold cross-validation show that DeEPsnap can accurately predict human gene essentiality with an average AUROC (Area Under the Receiver Operating Characteristic curve) of 96.
1%, the average AUPRC (Area under the Precision-Recall curve) of 93.
82%, the average accuracy of 92.
21%, and the average F1 measure about 80.
62%.
In addition, the comparison of experimental results shows that DeEPsnap outperforms several popular traditional machine learning models and deep learning models.
We have demonstrated that the proposed method, DeEPsnap, is effective for predicting human essential genes.
Related Results
DeEPsnap: human essential gene prediction by integrating multi-omics data
DeEPsnap: human essential gene prediction by integrating multi-omics data
Abstract
Essential genes are necessary for the survival or reproduction of a living organism. The prediction and analysis of gene essentiality can advance our under...
Why Pakistan Must Lead in Regional Multi-Omics Research for Precision Medicine
Why Pakistan Must Lead in Regional Multi-Omics Research for Precision Medicine
Precision medicine has emerged as one of the most transformative movements in global healthcare, shifting the clinical emphasis from generalized treatments to highly individualized...
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Abstract
Background and objectives
Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The tradition...
Profiling Osteoporosis via Integrated Multi-Omics Technologies
Profiling Osteoporosis via Integrated Multi-Omics Technologies
Background: Osteoporosis is a complex disorder involving bone loss and muscle degeneration. Multi-omics technologies provide novel insights into its molecular mechanisms and may su...
Machine learning combining multi-omics data and network algorithms identifies adrenocortical carcinoma prognostic biomarkers
Machine learning combining multi-omics data and network algorithms identifies adrenocortical carcinoma prognostic biomarkers
Background: Rare endocrine cancers such as Adrenocortical Carcinoma (ACC) present a serious diagnostic and prognostication challenge. The knowledge about ACC pathogenesis is incomp...
Integration of multi-omics datasets enables molecular classification of COPD
Integration of multi-omics datasets enables molecular classification of COPD
Chronic obstructive pulmonary disease (COPD) is an umbrella diagnosis caused by a multitude of underlying mechanisms, and molecular sub-phenotyping is needed to develop molecular d...
Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics
Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics
With the advent of high-throughput technologies, the field of omics has made significant strides in characterizing biological systems at various levels of complexity. Transcriptomi...
Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach, DeepSnap–Deep Learning
Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach, DeepSnap–Deep Learning
The aryl hydrocarbon receptor (AhR) is a ligand-dependent transcription factor that senses environmental exogenous and endogenous ligands or xenobiotic chemicals. In particular, ex...

