Javascript must be enabled to continue!
Abstract 393: Generalizability of Deep Learning Models for Aneurysm Detection: A Systematic Review of Cross‐Center and Cross‐Modality Validation Performance
View through CrossRef
Introduction
Deep learning (DL) has demonstrated high diagnostic accuracy in detecting intracranial aneurysms (IAs) on CT angiography (CTA) and digital subtraction angiography (DSA). However, most studies rely on single‐center retrospective datasets, raising concerns about model generalizability across institutions and imaging modalities. This systematic review and meta‐analysis evaluates the external validity of DL models for IA detection, with emphasis on cross‐center and cross‐modality performance.
Methods
We systematically reviewed meta‐analyses, retrospective multicenter validations, and prospective trials that assessed DL algorithms for IA detection using internal versus external datasets. Extracted metrics included sensitivity, specificity, and area under the curve (AUC). Comparative analysis was performed between internal validation (training/derivation cohorts) and external validation (independent centers or modalities). Subgroup analysis focused on segmentation performance (Dice similarity coefficients, DSC) and prospective clinical deployment.
Results
Four representative studies encompassing over 15,000 patients were analyzed. Overall, DL models demonstrated excellent internal performance, with pooled sensitivity of 0.93 (95% CI: 0.90‐0.95) and AUC 0.94 (95% CI: 0.92‐0.96). However, external validation revealed modest performance drops. Wei et al. (2024) reported sensitivity of 0.88 internally versus 0.857 externally, with AUC decline from 0.95 to 0.93. You et al. (2024) achieved internal sensitivity of 0.962 and Dice coefficient of 0.78, but external validation yielded reduced sensitivity (0.94) and Dice 0.71. Prospective validation by Hu et al. (2024) confirmed robust generalizability, with sensitivity of 0.957 internally and 0.943 externally, and consistent AUC (0.909 across both). Din et al. (2023) highlighted systematic reductions in accuracy when models were applied to unseen centers, with pooled sensitivity decreasing from 0.91 (internal) to 0.86 (external), reflecting potential overfitting. Across studies, pooled sensitivity loss on external testing averaged ‐4.5%, while AUC declined by ‐0.03. Despite this, all models retained diagnostic accuracy comparable to radiologists, underscoring strong translational potential.
Conclusion
Deep learning models for IA detection demonstrate strong internal validity but experience modest reductions in sensitivity and AUC when tested across centers and modalities. Performance declines are most evident in segmentation accuracy, with Dice coefficients dropping by ∼0.07 on external validation. Nonetheless, prospective multicenter trials confirm clinical robustness, with external sensitivity >94% in some models. These findings highlight both the promise and the challenge of achieving true generalizability. Future research should prioritize harmonized multi‐institutional datasets, standardized validation frameworks, and robust cross‐modality testing to ensure reliable deployment in diverse clinical environments.
Ovid Technologies (Wolters Kluwer Health)
Title: Abstract 393: Generalizability of Deep Learning Models for Aneurysm Detection: A Systematic Review of Cross‐Center and Cross‐Modality Validation Performance
Description:
Introduction
Deep learning (DL) has demonstrated high diagnostic accuracy in detecting intracranial aneurysms (IAs) on CT angiography (CTA) and digital subtraction angiography (DSA).
However, most studies rely on single‐center retrospective datasets, raising concerns about model generalizability across institutions and imaging modalities.
This systematic review and meta‐analysis evaluates the external validity of DL models for IA detection, with emphasis on cross‐center and cross‐modality performance.
Methods
We systematically reviewed meta‐analyses, retrospective multicenter validations, and prospective trials that assessed DL algorithms for IA detection using internal versus external datasets.
Extracted metrics included sensitivity, specificity, and area under the curve (AUC).
Comparative analysis was performed between internal validation (training/derivation cohorts) and external validation (independent centers or modalities).
Subgroup analysis focused on segmentation performance (Dice similarity coefficients, DSC) and prospective clinical deployment.
Results
Four representative studies encompassing over 15,000 patients were analyzed.
Overall, DL models demonstrated excellent internal performance, with pooled sensitivity of 0.
93 (95% CI: 0.
90‐0.
95) and AUC 0.
94 (95% CI: 0.
92‐0.
96).
However, external validation revealed modest performance drops.
Wei et al.
(2024) reported sensitivity of 0.
88 internally versus 0.
857 externally, with AUC decline from 0.
95 to 0.
93.
You et al.
(2024) achieved internal sensitivity of 0.
962 and Dice coefficient of 0.
78, but external validation yielded reduced sensitivity (0.
94) and Dice 0.
71.
Prospective validation by Hu et al.
(2024) confirmed robust generalizability, with sensitivity of 0.
957 internally and 0.
943 externally, and consistent AUC (0.
909 across both).
Din et al.
(2023) highlighted systematic reductions in accuracy when models were applied to unseen centers, with pooled sensitivity decreasing from 0.
91 (internal) to 0.
86 (external), reflecting potential overfitting.
Across studies, pooled sensitivity loss on external testing averaged ‐4.
5%, while AUC declined by ‐0.
03.
Despite this, all models retained diagnostic accuracy comparable to radiologists, underscoring strong translational potential.
Conclusion
Deep learning models for IA detection demonstrate strong internal validity but experience modest reductions in sensitivity and AUC when tested across centers and modalities.
Performance declines are most evident in segmentation accuracy, with Dice coefficients dropping by ∼0.
07 on external validation.
Nonetheless, prospective multicenter trials confirm clinical robustness, with external sensitivity >94% in some models.
These findings highlight both the promise and the challenge of achieving true generalizability.
Future research should prioritize harmonized multi‐institutional datasets, standardized validation frameworks, and robust cross‐modality testing to ensure reliable deployment in diverse clinical environments.
Related Results
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Blood pressure, hypertension, and the risk of aortic aneurysm in the UK Biobank
Blood pressure, hypertension, and the risk of aortic aneurysm in the UK Biobank
Abstract
Background
Although an association between elevated blood pressure and risk of aortic aneurysm is established, f...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Role of computed tomography angiography for the diagnosis of cerebral aneurysm in patients with atherosclerosis.
Role of computed tomography angiography for the diagnosis of cerebral aneurysm in patients with atherosclerosis.
Objective: To determine the role of CT Angiography for the Diagnosis of Cerebral Aneurysm types in Patients with Atherosclerosis. Study Design: Retrospective study. Setting: Chught...
Mortality After Elective and Ruptured Abdominal Aortic Aneurysm Surgical Repair: 12-Year Single-Center Experience of Estonia
Mortality After Elective and Ruptured Abdominal Aortic Aneurysm Surgical Repair: 12-Year Single-Center Experience of Estonia
Background and Aims:
Abdominal aortic aneurysm is a degenerative vascular pathology with high mortality due to its rupture, which is why timely treatment is cru...
Abstract W P77: Ruptured Aneurysms Tend To Have Prominent Blood Flow Changes At Aneurysm Neck
Abstract W P77: Ruptured Aneurysms Tend To Have Prominent Blood Flow Changes At Aneurysm Neck
Introduction:
Hemodynamics is thought to play an important role in the pathogenesis, progression, and rupture of aneurysms. Reports have suggested that aneurysm rupture...
Unruptured intracranial aneurysms
Prediction of growth and rupture
Unruptured intracranial aneurysms
Prediction of growth and rupture
This thesis advances the understanding of unruptured intracranial aneurysms
(UIAs), focusing particularly on risk prediction, aneurysm growth, and aneurysm
wall enhancement (AWE). ...
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...

