Javascript must be enabled to continue!
Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning of Bridge2AI AI-READI Data
View through CrossRef
ABSTRACT
As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%–10% to type 2 diabetes. Total diabetes (diagnosed and undiagnosed) affects approximately 40.1 million Americans, or 12% of the population, with roughly 1.5 million new cases diagnosed annually. Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) misses. It enables, for instance, identification of nocturnal hypoglycemia or postprandial spikes, enhancing personalized, actionable treatment decisions and improving safety. The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) dataset was produced by the National Institutes of Health (NIH) Common Fund Data Ecosystem (CFDE) Bridge2AI program. This dataset offers a rich resource for diabetes research, providing comprehensive biosensor data from over 1,067 participants. However, like many medical datasets, AI-READI contains label inaccuracies due to self-reported health surveys and static HbA1c indicators, which can undermine model effectiveness. We developed a strong classification framework using Convolutional-Bidirectional Long Short-Term Memory (Conv+BiLSTM) to analyze and accurately classify glycemic health states from continuous glucose monitoring time-series data. Our aim was to establish and correct any misclassified labels through hybrid unsupervised-supervised learning methods and validated our results with expert-in-the-loop clinical review. We analyzed 784 participants from the AI-READI dataset, which represented four health states: healthy, prediabetes lifestyle controlled, oral medication, and insulin-dependent. Based on recommendations from the literature and our own expertise, we sought to compare the self-provided “healthy” group labels with a cluster-agnostic, CGM-defined healthy (CGM-H) reference derived from the CGM metrics using K-means clustering (K=6) on standardized CGM summary features to identify CGM-H participants and then applied XGBoost-based iterative label refinement. We identified a misclassification rate of 56.9% (161/283) in the initially labeled “healthy” group. After eight iterations of XGBoost refinement with dual-criterion relabeling (≥80% probability + unanimous out-of-fold voting), the cleaned dataset increased CGM-H participants from 122 to 195 for binary classification. Next, we developed a Conv+BiLSTM model combining Convolutional layers (32, 64 filters) for local temporal feature extraction with Bidirectional LSTM layers (64, 32 units) for sequence modeling, using time-series engineered features including rolling statistics, glucose derivatives, and circadian rhythm encoding. Class imbalance was addressed with per-class weighting, and 5-fold stratified cross-validation estimated generalization performance, computing a global decision threshold (0.374) by maximizing Youden’s J statistic on concatenated out-of-fold predictions. Additionally, we analyzed heart rate, activity level, and stress and sleep data and validated it against CGM data. The Conv+BiLSTM model achieved ROC-AUC ≈ 0.932 on the held-out test set and 0.907 ± 0.026 in cross-validation, with well-calibrated predictions (Expected Calibration Error = 0.075, temperature scaling T = 1.00). A 3-tier confidence-based decision system achieved 82% detection rate with only 6% OGTT burden, enabling actionable clinical recommendations. This hybrid approach addressed label noise while achieving high discrimination. This framework demonstrates potential for real-time glycemic state monitoring and early intervention in diabetes progression.
Title: Enhancing Prediabetes Diagnosis from Continuous Glucose Monitoring Data via Iterative Label Cleaning and Deep Learning of Bridge2AI AI-READI Data
Description:
ABSTRACT
As of early 2026, over 115 million US adults (more than 1 in 3) have prediabetes, a condition with an annual conversion rate of 5%–10% to type 2 diabetes.
Total diabetes (diagnosed and undiagnosed) affects approximately 40.
1 million Americans, or 12% of the population, with roughly 1.
5 million new cases diagnosed annually.
Continuous Glucose Monitoring (CGM) provides real-time, 24/7 insights into glycemic variability, detecting dangerous highs, lows, and trends that HbA1c (a 3-month average) misses.
It enables, for instance, identification of nocturnal hypoglycemia or postprandial spikes, enhancing personalized, actionable treatment decisions and improving safety.
The Artificial Intelligence Ready and Exploratory Atlas for Diabetes Insights (AI-READI) dataset was produced by the National Institutes of Health (NIH) Common Fund Data Ecosystem (CFDE) Bridge2AI program.
This dataset offers a rich resource for diabetes research, providing comprehensive biosensor data from over 1,067 participants.
However, like many medical datasets, AI-READI contains label inaccuracies due to self-reported health surveys and static HbA1c indicators, which can undermine model effectiveness.
We developed a strong classification framework using Convolutional-Bidirectional Long Short-Term Memory (Conv+BiLSTM) to analyze and accurately classify glycemic health states from continuous glucose monitoring time-series data.
Our aim was to establish and correct any misclassified labels through hybrid unsupervised-supervised learning methods and validated our results with expert-in-the-loop clinical review.
We analyzed 784 participants from the AI-READI dataset, which represented four health states: healthy, prediabetes lifestyle controlled, oral medication, and insulin-dependent.
Based on recommendations from the literature and our own expertise, we sought to compare the self-provided “healthy” group labels with a cluster-agnostic, CGM-defined healthy (CGM-H) reference derived from the CGM metrics using K-means clustering (K=6) on standardized CGM summary features to identify CGM-H participants and then applied XGBoost-based iterative label refinement.
We identified a misclassification rate of 56.
9% (161/283) in the initially labeled “healthy” group.
After eight iterations of XGBoost refinement with dual-criterion relabeling (≥80% probability + unanimous out-of-fold voting), the cleaned dataset increased CGM-H participants from 122 to 195 for binary classification.
Next, we developed a Conv+BiLSTM model combining Convolutional layers (32, 64 filters) for local temporal feature extraction with Bidirectional LSTM layers (64, 32 units) for sequence modeling, using time-series engineered features including rolling statistics, glucose derivatives, and circadian rhythm encoding.
Class imbalance was addressed with per-class weighting, and 5-fold stratified cross-validation estimated generalization performance, computing a global decision threshold (0.
374) by maximizing Youden’s J statistic on concatenated out-of-fold predictions.
Additionally, we analyzed heart rate, activity level, and stress and sleep data and validated it against CGM data.
The Conv+BiLSTM model achieved ROC-AUC ≈ 0.
932 on the held-out test set and 0.
907 ± 0.
026 in cross-validation, with well-calibrated predictions (Expected Calibration Error = 0.
075, temperature scaling T = 1.
00).
A 3-tier confidence-based decision system achieved 82% detection rate with only 6% OGTT burden, enabling actionable clinical recommendations.
This hybrid approach addressed label noise while achieving high discrimination.
This framework demonstrates potential for real-time glycemic state monitoring and early intervention in diabetes progression.
Related Results
Faktor-Faktor yang Berhubungan dengan Kejadian Prediabetes
Faktor-Faktor yang Berhubungan dengan Kejadian Prediabetes
Latar belakang: Prediabetes merupakan awal terjadinya diabetes mellitus. Prediabetes tidak mempunyai gambaran khas seperti diabetes mellitus, akan tetapi prevalensi prediabetes leb...
CUT-OFF POINT FOR FASTING GLUCOSE IN DIAGNOSING PREDIABETES
CUT-OFF POINT FOR FASTING GLUCOSE IN DIAGNOSING PREDIABETES
Objective. This study aimed to evaluate the feasibility of using fasting glucose as a primary diagnostic criterion for prediabetes, and to determine the optimal cut-off point for d...
The 1-hour Plasma Glucose Predicts the Progression from Normal Glucose Tolerance to Prediabetes
The 1-hour Plasma Glucose Predicts the Progression from Normal Glucose Tolerance to Prediabetes
<p dir="ltr">Objective: To examine the ability of the 1-hour plasma glucose (PG) concentration during the OGTT to predict the risk of progression to prediabetes in NGT indivi...
The 1-hour Plasma Glucose Predicts the Progression from Normal Glucose Tolerance to Prediabetes
The 1-hour Plasma Glucose Predicts the Progression from Normal Glucose Tolerance to Prediabetes
<p dir="ltr">Objective: To examine the ability of the 1-hour plasma glucose (PG) concentration during the OGTT to predict the risk of progression to prediabetes in NGT indivi...
A Community-Based Prediabetes Knowledge Assessment among Saudi Adults in Al-Ahsa Region, 2018
A Community-Based Prediabetes Knowledge Assessment among Saudi Adults in Al-Ahsa Region, 2018
Background
Prediabetes has been considered to be a reversible condition; a modification of lifestyle and other intervention can be successfully applied during t...
Pregnancy and Challenging Transient Anti-GAD65 Positivity: A Case Report with Literature Review
Pregnancy and Challenging Transient Anti-GAD65 Positivity: A Case Report with Literature Review
Abstract
Introduction
During pregnancy, women may develop blood glucose abnormalities like gestational diabetes mellitus (GDM) or, rarely, type 1 diabetes (T1D), which can lead to ...
Prevalence of Prediabetes in Adolescents
Prevalence of Prediabetes in Adolescents
Prediabetes is closely associated with an increased risk of adult-onset diabetes, and the risk is even greater when it occurs in adolescence. Prediabetes is increasing globally acr...
Sex differences in the association between vitamin D and prediabetes in adults: A cross-sectional study
Sex differences in the association between vitamin D and prediabetes in adults: A cross-sectional study
Abstract
Background/Objectives
Vitamin D status has been shown to be associated with prediabetes risk. However, epidemiologic evidence on whether se...

