Javascript must be enabled to continue!
Integrated Bioinformatics and Ensemble Learning Reveal Diagnostic Modeling and Drug Discovery in Alzheimer’s Disease
View through CrossRef
Abstract
Background:
Alzheimer’s disease (AD) is driven by complex molecular and immune dysregulation, yet reliable diagnostic biomarkers and druggable targets remain limited. This study aimed to identify key AD-associated regulatory genes, characterize their immune and spatial expression features, and prioritize small-molecule compounds with therapeutic potential.
Methods:
Multiple AD-related transcriptomic datasets—including bulk RNA-seq, microarray, and spatial transcriptomic profiles—were retrieved from GEO and systematically partitioned into discovery (GSE5281, GSE66333), validation (GSE110226, GSE28146, GSE29378), independent testing (GSE29378), and spatial validation cohorts (GSE147047). Differential expression analysis and weighted gene co-expression network analysis (WGCNA) were used to construct co-expression networks and define AD-associated gene modules. Protein–protein interaction (PPI) analysis and multiple network centrality measures were then applied to prioritize candidate key genes. Twelve machine-learning algorithms were combined into 127 classification models, and SHAP-based interpretability analysis was used to quantify feature contributions and identify diagnostic genes. Single-cell and spatial transcriptomic data were further used to validate the cell type specificity and spatial localization of the hub genes. Drug–gene enrichment analysis (DSigDB), compound retrieval (PubChem), ADMET and drug-likeness profiling, and molecular blind docking were integrated to screen and evaluate potential lead compounds.
Results:
We identified 2,534 differentially expressed genes (DEGs) between AD and control samples, and their intersection with WGCNA-derived modules yielded 848 candidate genes. PPI-based network analysis prioritized 15 key genes, on which 127 machine-learning models were constructed; the random forest model achieved the best overall performance with an average AUC of 0.957. SHAP analysis identified 11 key diagnostic genes, among which IGF1R and SPP1 emerged as stable hub genes with AUCs greater than 0.70 across multiple external cohorts. Immune infiltration, single-cell, and spatial transcriptomic analyses demonstrated distinct immune associations and cell type– and region-specific expression patterns of these hub genes. Drug–gene enrichment identified 176 drug signatures and 445 related compounds, of which 37 grade-A molecules remained after ADMET and drug-likeness filtering. Molecular docking revealed four top-ranked compounds with binding energies better than −9.0 kcal/mol, including one ligand with a minimum binding energy of −10.5 kcal/mol and extensive non-covalent interactions with the target protein.
Conclusion:
A systematic methodological framework from gene discovery and diagnostic modeling to lead drug screening was developed in this study. IGF1R and SPP1 were identified as stable and biologically interpretable AD hub genes, which can be used as potential diagnostic markers, and various high-affinity small molecule compounds based on the hub genes provide new drug candidates for targeted AD therap.
Title: Integrated Bioinformatics and Ensemble Learning Reveal Diagnostic Modeling and Drug Discovery in Alzheimer’s Disease
Description:
Abstract
Background:
Alzheimer’s disease (AD) is driven by complex molecular and immune dysregulation, yet reliable diagnostic biomarkers and druggable targets remain limited.
This study aimed to identify key AD-associated regulatory genes, characterize their immune and spatial expression features, and prioritize small-molecule compounds with therapeutic potential.
Methods:
Multiple AD-related transcriptomic datasets—including bulk RNA-seq, microarray, and spatial transcriptomic profiles—were retrieved from GEO and systematically partitioned into discovery (GSE5281, GSE66333), validation (GSE110226, GSE28146, GSE29378), independent testing (GSE29378), and spatial validation cohorts (GSE147047).
Differential expression analysis and weighted gene co-expression network analysis (WGCNA) were used to construct co-expression networks and define AD-associated gene modules.
Protein–protein interaction (PPI) analysis and multiple network centrality measures were then applied to prioritize candidate key genes.
Twelve machine-learning algorithms were combined into 127 classification models, and SHAP-based interpretability analysis was used to quantify feature contributions and identify diagnostic genes.
Single-cell and spatial transcriptomic data were further used to validate the cell type specificity and spatial localization of the hub genes.
Drug–gene enrichment analysis (DSigDB), compound retrieval (PubChem), ADMET and drug-likeness profiling, and molecular blind docking were integrated to screen and evaluate potential lead compounds.
Results:
We identified 2,534 differentially expressed genes (DEGs) between AD and control samples, and their intersection with WGCNA-derived modules yielded 848 candidate genes.
PPI-based network analysis prioritized 15 key genes, on which 127 machine-learning models were constructed; the random forest model achieved the best overall performance with an average AUC of 0.
957.
SHAP analysis identified 11 key diagnostic genes, among which IGF1R and SPP1 emerged as stable hub genes with AUCs greater than 0.
70 across multiple external cohorts.
Immune infiltration, single-cell, and spatial transcriptomic analyses demonstrated distinct immune associations and cell type– and region-specific expression patterns of these hub genes.
Drug–gene enrichment identified 176 drug signatures and 445 related compounds, of which 37 grade-A molecules remained after ADMET and drug-likeness filtering.
Molecular docking revealed four top-ranked compounds with binding energies better than −9.
0 kcal/mol, including one ligand with a minimum binding energy of −10.
5 kcal/mol and extensive non-covalent interactions with the target protein.
Conclusion:
A systematic methodological framework from gene discovery and diagnostic modeling to lead drug screening was developed in this study.
IGF1R and SPP1 were identified as stable and biologically interpretable AD hub genes, which can be used as potential diagnostic markers, and various high-affinity small molecule compounds based on the hub genes provide new drug candidates for targeted AD therap.
Related Results
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer
Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer
Abstract— Alzheimer's disease is a neurodegenerative disease that develops gradually, and is associated with cardiovascular and cerebrovascular problems. Alzheimer's is a serious d...
Race, polygenic risk and their association with incident dementia among older US adults
Race, polygenic risk and their association with incident dementia among older US adults
AbstractDementia incidence increases steadily with age at rates that may vary across racial groups. This racial disparity may be attributable to polygenic risk, as well as lifestyl...
Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS
Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS
Abstract
Background
Diagnosis of neurogenic thoracic outlet syndrome (nTOS) is hindered by symptom overlap with cervical radiculopathy, carpal tunnel syndrome, or psychosomatic dis...
Clinical characteristics and biomarker profile in early- and late-onset Alzheimer’s disease: the Shanghai Memory Study
Clinical characteristics and biomarker profile in early- and late-onset Alzheimer’s disease: the Shanghai Memory Study
Abstract
Early-onset Alzheimer’s disease constitutes ∼5–10% of Alzheimer’s disease. Its clinical characteristics and biomarker profiles are not well documented. To c...
Advancements in Biomedical and Bioinformatics Engineering
Advancements in Biomedical and Bioinformatics Engineering
Abstract: The field of biomedical and bioinformatics engineering is witnessing rapid advancements that are revolutionizing healthcare and medical research. This chapter provides a...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
ATN status in amnestic and non-amnestic Alzheimer’s disease and frontotemporal lobar degeneration
ATN status in amnestic and non-amnestic Alzheimer’s disease and frontotemporal lobar degeneration
AbstractUnder the ATN framework, cerebrospinal fluid analytes provide evidence of the presence or absence of Alzheimer’s disease pathological hallmarks: amyloid plaques (A), phosph...

