Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

The Protocol Genome: A Self-Supervised Learning Framework from DICOM Headers

View through CrossRef
In this paper, we propose the Protocol Genome, a self-supervised learning framework from DICOM headers, achieving AUROC 0.901 (vs 0.847 baseline) and ECE 0.036 (vs 0.058) on fully held-out external validation. Our method demonstrates significant improved calibration and robustness across multiple different modalities (CT, MRI, CXR) and vendors. Clinical imaging flows through PACS and DICOM, whose protocols (scanner make/model; sequence; reconstruction kernel; kVp; TR/TE; slice thickness) dictate contrast, noise, and artifact profiles. These protocol choices give rise to hidden confounders that prevent cross-site generalization of image-only neural networks and challenge multi-center deployment. We present The Protocol Genome, a self-supervised learning (SSL) framework where structured DICOM headers are treated as a genomic code, and protocol-aware yet clinically robust image representations are learned. The Protocol Genome extracts tokenized embeddings for de-identified DICOM header fields and matches these with image-related features through: (1) protocol–image contrastive learning, (2) masked protocol prediction, and (3) protocol–protocol translation across series. We experiment with 1.26M studies (7 health systems, 31 scanners from 3 vendors; CT, MR, CR/DR modalities) and evaluate across three downstream tasks: (A) chest CT triage for acute PE, (B) brain MRI low-grade vs. high-grade glioma classification, and (C) chest radiograph cardiomegaly detection. Compared to strong SSL baselines (SimCLR, MAE) and ImageNet transfer, Protocol Genome pretraining increases external-site AUROC by +0.046 (95% CI: +0.031–+0.060) for PE, +0.058 (+0.036–+0.079) for glioma, and +0.041 (+0.028–+0.054) for cardiomegaly; calibration (ECE) improves by 25–37%. Further DeLong tests support significance (all p<0.001). Ablations indicate gains remain with 10–20% labeled data. Clinically, the method is applicable to reducing false positives at protocol borders and can be integrated into a PACS (DICOM C-FIND/C-MOVE, DICOMweb QIDO/WADO). We release a model card and deployment recommendations, with de-identification and bias auditing steps.
Title: The Protocol Genome: A Self-Supervised Learning Framework from DICOM Headers
Description:
In this paper, we propose the Protocol Genome, a self-supervised learning framework from DICOM headers, achieving AUROC 0.
901 (vs 0.
847 baseline) and ECE 0.
036 (vs 0.
058) on fully held-out external validation.
Our method demonstrates significant improved calibration and robustness across multiple different modalities (CT, MRI, CXR) and vendors.
Clinical imaging flows through PACS and DICOM, whose protocols (scanner make/model; sequence; reconstruction kernel; kVp; TR/TE; slice thickness) dictate contrast, noise, and artifact profiles.
These protocol choices give rise to hidden confounders that prevent cross-site generalization of image-only neural networks and challenge multi-center deployment.
We present The Protocol Genome, a self-supervised learning (SSL) framework where structured DICOM headers are treated as a genomic code, and protocol-aware yet clinically robust image representations are learned.
The Protocol Genome extracts tokenized embeddings for de-identified DICOM header fields and matches these with image-related features through: (1) protocol–image contrastive learning, (2) masked protocol prediction, and (3) protocol–protocol translation across series.
We experiment with 1.
26M studies (7 health systems, 31 scanners from 3 vendors; CT, MR, CR/DR modalities) and evaluate across three downstream tasks: (A) chest CT triage for acute PE, (B) brain MRI low-grade vs.
high-grade glioma classification, and (C) chest radiograph cardiomegaly detection.
Compared to strong SSL baselines (SimCLR, MAE) and ImageNet transfer, Protocol Genome pretraining increases external-site AUROC by +0.
046 (95% CI: +0.
031–+0.
060) for PE, +0.
058 (+0.
036–+0.
079) for glioma, and +0.
041 (+0.
028–+0.
054) for cardiomegaly; calibration (ECE) improves by 25–37%.
Further DeLong tests support significance (all p<0.
001).
Ablations indicate gains remain with 10–20% labeled data.
Clinically, the method is applicable to reducing false positives at protocol borders and can be integrated into a PACS (DICOM C-FIND/C-MOVE, DICOMweb QIDO/WADO).
We release a model card and deployment recommendations, with de-identification and bias auditing steps.

Related Results

The Role of Standards in Accelerating the Uptake of Artificial Intelligence in Dermatology (Preprint)
The Role of Standards in Accelerating the Uptake of Artificial Intelligence in Dermatology (Preprint)
BACKGROUND The use of artificial intelligence (AI) for dermatology is showing great promise in research contexts. However, the clinical use of AI in dermato...
CytometryML, an XML format based on DICOM and FCS for analytical cytology data
CytometryML, an XML format based on DICOM and FCS for analytical cytology data
AbstractBackgroundFlow Cytometry Standard (FCS) was initially created to standardize the software researchers use to analyze, transmit, and store data produced by flow cytometers a...
Web Validation Service for Ensuring Adherence to the DICOM Standard
Web Validation Service for Ensuring Adherence to the DICOM Standard
The DICOM Standard has been fundamental for ensuring the interoperability of Picture Archive and Communications Systems (PACS). By compiling rigorously to the standard, medical ima...
Reversible Anonymization of DICOM Images Using Automatically Generated Policies
Reversible Anonymization of DICOM Images Using Automatically Generated Policies
Many real-world applications in the area of medical imaging like case study databases require separation of identifying (IDATA) and non-identifying (MDATA) data, specifically those...
Automatic Selective Encryption of DICOM Images
Automatic Selective Encryption of DICOM Images
Securing DICOM images is essential to protect the privacy of patients, especially in the era of telemedicine and eHealth/mHealth. This increases the demand for rapid security. Neve...
DICOM segmentation and STL creation for 3D Printing: A Process and Software Package Comparison for Osseous Anatomy
DICOM segmentation and STL creation for 3D Printing: A Process and Software Package Comparison for Osseous Anatomy
Abstract Background: Extracting and three-dimensional (3D) printing an organ in a region of interest in DICOM images typically calls for segmentation as a first step in sup...
DICOM segmentation and STL creation for 3D Printing: A Process and Software Package Comparison for Osseous Anatomy
DICOM segmentation and STL creation for 3D Printing: A Process and Software Package Comparison for Osseous Anatomy
Abstract Background: Extracting and three-dimensional (3D) printing an organ in a region of interest in DICOM images typically calls for segmentation as a first step in sup...

Back to Top