Javascript must be enabled to continue!

Integrated Bioinformatics and Ensemble Learning Reveal Diagnostic Modeling and Drug Discovery in Alzheimer’s Disease

Abstract Background: Alzheimer’s disease (AD) is driven by complex molecular and immune dysregulation, yet reliable diagnostic biomarkers and druggable targets remain limited. This study aimed to identify key AD-associated regulatory genes, characterize their immune and spatial expression features, and prioritize small-molecule compounds with therapeutic potential. Methods: Multiple AD-related transcriptomic datasets—including bulk RNA-seq, microarray, and spatial transcriptomic profiles—were retrieved from GEO and systematically partitioned into discovery (GSE5281, GSE66333), validation (GSE110226, GSE28146, GSE29378), independent testing (GSE29378), and spatial validation cohorts (GSE147047). Differential expression analysis and weighted gene co-expression network analysis (WGCNA) were used to construct co-expression networks and define AD-associated gene modules. Protein–protein interaction (PPI) analysis and multiple network centrality measures were then applied to prioritize candidate key genes. Twelve machine-learning algorithms were combined into 127 classification models, and SHAP-based interpretability analysis was used to quantify feature contributions and identify diagnostic genes. Single-cell and spatial transcriptomic data were further used to validate the cell type specificity and spatial localization of the hub genes. Drug–gene enrichment analysis (DSigDB), compound retrieval (PubChem), ADMET and drug-likeness profiling, and molecular blind docking were integrated to screen and evaluate potential lead compounds. Results: We identified 2,534 differentially expressed genes (DEGs) between AD and control samples, and their intersection with WGCNA-derived modules yielded 848 candidate genes. PPI-based network analysis prioritized 15 key genes, on which 127 machine-learning models were constructed; the random forest model achieved the best overall performance with an average AUC of 0.957. SHAP analysis identified 11 key diagnostic genes, among which IGF1R and SPP1 emerged as stable hub genes with AUCs greater than 0.70 across multiple external cohorts. Immune infiltration, single-cell, and spatial transcriptomic analyses demonstrated distinct immune associations and cell type– and region-specific expression patterns of these hub genes. Drug–gene enrichment identified 176 drug signatures and 445 related compounds, of which 37 grade-A molecules remained after ADMET and drug-likeness filtering. Molecular docking revealed four top-ranked compounds with binding energies better than −9.0 kcal/mol, including one ligand with a minimum binding energy of −10.5 kcal/mol and extensive non-covalent interactions with the target protein. Conclusion: A systematic methodological framework from gene discovery and diagnostic modeling to lead drug screening was developed in this study. IGF1R and SPP1 were identified as stable and biologically interpretable AD hub genes, which can be used as potential diagnostic markers, and various high-affinity small molecule compounds based on the hub genes provide new drug candidates for targeted AD therap.

Springer Science and Business Media LLC

WANG ZIFU Hou Jinqi Zhu Yuxuan Chenyun Guan

2025

Title: Integrated Bioinformatics and Ensemble Learning Reveal Diagnostic Modeling and Drug Discovery in Alzheimer’s Disease

Description:

Abstract Background: Alzheimer’s disease (AD) is driven by complex molecular and immune dysregulation, yet reliable diagnostic biomarkers and druggable targets remain limited.

This study aimed to identify key AD-associated regulatory genes, characterize their immune and spatial expression features, and prioritize small-molecule compounds with therapeutic potential.

Methods: Multiple AD-related transcriptomic datasets—including bulk RNA-seq, microarray, and spatial transcriptomic profiles—were retrieved from GEO and systematically partitioned into discovery (GSE5281, GSE66333), validation (GSE110226, GSE28146, GSE29378), independent testing (GSE29378), and spatial validation cohorts (GSE147047).

Differential expression analysis and weighted gene co-expression network analysis (WGCNA) were used to construct co-expression networks and define AD-associated gene modules.

Protein–protein interaction (PPI) analysis and multiple network centrality measures were then applied to prioritize candidate key genes.

Twelve machine-learning algorithms were combined into 127 classification models, and SHAP-based interpretability analysis was used to quantify feature contributions and identify diagnostic genes.

Single-cell and spatial transcriptomic data were further used to validate the cell type specificity and spatial localization of the hub genes.

Drug–gene enrichment analysis (DSigDB), compound retrieval (PubChem), ADMET and drug-likeness profiling, and molecular blind docking were integrated to screen and evaluate potential lead compounds.

Results: We identified 2,534 differentially expressed genes (DEGs) between AD and control samples, and their intersection with WGCNA-derived modules yielded 848 candidate genes.

PPI-based network analysis prioritized 15 key genes, on which 127 machine-learning models were constructed; the random forest model achieved the best overall performance with an average AUC of 0.

957.

SHAP analysis identified 11 key diagnostic genes, among which IGF1R and SPP1 emerged as stable hub genes with AUCs greater than 0.

70 across multiple external cohorts.

Immune infiltration, single-cell, and spatial transcriptomic analyses demonstrated distinct immune associations and cell type– and region-specific expression patterns of these hub genes.

Drug–gene enrichment identified 176 drug signatures and 445 related compounds, of which 37 grade-A molecules remained after ADMET and drug-likeness filtering.

Molecular docking revealed four top-ranked compounds with binding energies better than −9.

0 kcal/mol, including one ligand with a minimum binding energy of −10.

5 kcal/mol and extensive non-covalent interactions with the target protein.

Conclusion: A systematic methodological framework from gene discovery and diagnostic modeling to lead drug screening was developed in this study.

IGF1R and SPP1 were identified as stable and biologically interpretable AD hub genes, which can be used as potential diagnostic markers, and various high-affinity small molecule compounds based on the hub genes provide new drug candidates for targeted AD therap.

Back

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

Penerapan Metode Convolutional Neural Network untuk Diagnosa Penyakit Alzheimer

Abstract— Alzheimer's disease is a neurodegenerative disease that develops gradually, and is associated with cardiovascular and cerebrovascular problems. Alzheimer's is a serious d...

Suffering of Patients with Neurogenic Thoracic Outlet Syndrome (TOS); The First Qualitative study in TOS

Abstract Background Diagnosis of neurogenic thoracic outlet syndrome (nTOS) is hindered by symptom overlap with cervical radiculopathy, carpal tunnel syndrome, or psychosomatic dis...

Advancements in Biomedical and Bioinformatics Engineering

Abstract: The field of biomedical and bioinformatics engineering is witnessing rapid advancements that are revolutionizing healthcare and medical research. This chapter provides a...

ATN status in amnestic and non-amnestic Alzheimer’s disease and frontotemporal lobar degeneration

AbstractUnder the ATN framework, cerebrospinal fluid analytes provide evidence of the presence or absence of Alzheimer’s disease pathological hallmarks: amyloid plaques (A), phosph...

Bioinformatics tool and web server development focusing on structural bioinformatics applications

This thesis is divided into two main sections: Part 1 describes the design, and evaluation of the accuracy of a new web server – PRotein Interactive MOdeling (PRIMO-Complexes) for ...

A large-scale analysis of bioinformatics code on GitHub

AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of sof...

New classifications for quantum bioinformatics: Q-bioinformatics, QCt-bioinformatics, QCg-bioinformatics, and QCr-bioinformatics

Abstract Bioinformatics has revolutionized biology and medicine by using computational methods to analyze and interpret biological data. Quantum mechanics has recent...

Email:
Password:

Email:

Integrated Bioinformatics and Ensemble Learning Reveal Diagnostic Modeling and Drug Discovery in Alzheimer’s Disease

Related Results