Javascript must be enabled to continue!
Fine-Tuning for Accuracy: Evaluation of GPT for Automatic Assignment of ICD Codes to Clinical Documentation
View through CrossRef
Abstract
Background:
Assignment of International Classification of Disease (ICD) codes to clinical documentation is a tedious but important task that is mostly done manually. This study evaluated the widely popular OpenAI’s Generative Pretrained Model (GPT) 3.5 Turbo in facilitating the automation of assigning ICD codes to clinical notes.
Methods:
We identified the 10 most prevalent ICD-10 codes in the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. We selected 200 notes for each code, and then split them equally into two groups of 100 each (randomly selected) for training and testing. We then passed each note to GPT 3.5 Turbo via OpenAI’s API, prompting the model to assign ICD-10 codes to each note. We evaluated the model’s response for the presence of the target ICD-10 code. After fine-tuning the GPT model on the training data, we repeated the process with the test data, comparing the fine-tuned model’s performance against the default model.
Results:
Initially the target ICD-10 code was present in the assigned codes by the default GPT 3.5 Turbo model in 29.7% of the cases. After fine-tuning with 100 notes for each top code, the accuracy improved to 62.6%.
Conclusions:
Historically, GPT’s performance for healthcare related tasks is sub-optimal. Fine-tuning as in this study provides great potential for improved performance, highlighting a path forward for integration of Artificial Intelligence (AI) in healthcare for improved efficiency and accuracy of this administrative task. Future research should focus on expanding the training datasets with specialized data and exploring the potential integration of these models into existing healthcare systems to maximize their utility and reliability.
Springer Science and Business Media LLC
Title: Fine-Tuning for Accuracy: Evaluation of GPT for Automatic Assignment of ICD Codes to Clinical Documentation
Description:
Abstract
Background:
Assignment of International Classification of Disease (ICD) codes to clinical documentation is a tedious but important task that is mostly done manually.
This study evaluated the widely popular OpenAI’s Generative Pretrained Model (GPT) 3.
5 Turbo in facilitating the automation of assigning ICD codes to clinical notes.
Methods:
We identified the 10 most prevalent ICD-10 codes in the Medical Information Mart for Intensive Care (MIMIC-IV) dataset.
We selected 200 notes for each code, and then split them equally into two groups of 100 each (randomly selected) for training and testing.
We then passed each note to GPT 3.
5 Turbo via OpenAI’s API, prompting the model to assign ICD-10 codes to each note.
We evaluated the model’s response for the presence of the target ICD-10 code.
After fine-tuning the GPT model on the training data, we repeated the process with the test data, comparing the fine-tuned model’s performance against the default model.
Results:
Initially the target ICD-10 code was present in the assigned codes by the default GPT 3.
5 Turbo model in 29.
7% of the cases.
After fine-tuning with 100 notes for each top code, the accuracy improved to 62.
6%.
Conclusions:
Historically, GPT’s performance for healthcare related tasks is sub-optimal.
Fine-tuning as in this study provides great potential for improved performance, highlighting a path forward for integration of Artificial Intelligence (AI) in healthcare for improved efficiency and accuracy of this administrative task.
Future research should focus on expanding the training datasets with specialized data and exploring the potential integration of these models into existing healthcare systems to maximize their utility and reliability.
Related Results
The Effect of Clinical Knee Measurement in Children with Genu Varus
The Effect of Clinical Knee Measurement in Children with Genu Varus
Abstract
Introduction
Children with genu varus needs frequent assessment and follow up that may need several radiographies. This study investigates the effectiveness of the clinica...
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
AbstractThis study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic...
Benefit of Implantable Cardioverter Defibrillator Use in Japanese Patients Based on Modified MADIT-ICD Benefit Score
Benefit of Implantable Cardioverter Defibrillator Use in Japanese Patients Based on Modified MADIT-ICD Benefit Score
Abstract
Aims
The MADIT-ICD benefit score is used to stratify the risk of life-threatening arrhythmia and non-arrhythmic ...
Analisis Penggunaan GPT dalam Pembelajaran Klinik Optik I di ARO Gapopin
Analisis Penggunaan GPT dalam Pembelajaran Klinik Optik I di ARO Gapopin
Perkembangan teknologi kecerdasan buatan (Artificial Intelligence/AI), khususnya model bahasa besar seperti Generative Pre-trained Transformer (GPT), telah membawa transformasi bes...
Impact of Cardiac Resynchronization Therapy on Hospitalizations in the Resynchronization-Defibrillation for Ambulatory Heart Failure Trial
Impact of Cardiac Resynchronization Therapy on Hospitalizations in the Resynchronization-Defibrillation for Ambulatory Heart Failure Trial
Background—
This study reports the impact of cardiac resynchronization therapy (CRT) on hospitalizations in patients randomized to implantable cardioverter-defibrillato...
Clinical outcomes of subcutaneous vs. transvenous implantable defibrillator therapy in a polymorbid patient cohort
Clinical outcomes of subcutaneous vs. transvenous implantable defibrillator therapy in a polymorbid patient cohort
BackgroundThe subcutaneous implantable cardioverter-defibrillator (S-ICD) has been designed to overcome lead-related complications and device endocarditis. Lacking the ability for ...
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
Abstract
This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evalua...
Abstract T MP80: Comparison of ICD-9-CM and Clinical Diagnoses for Stroke Patients in the Paul Coverdell National Acute Stroke Registry, 2013
Abstract T MP80: Comparison of ICD-9-CM and Clinical Diagnoses for Stroke Patients in the Paul Coverdell National Acute Stroke Registry, 2013
Background:
Few large studies have examined the comparison of ICD-9-CM codes and clinical diagnoses (CDX) for acute stroke patients. We analyzed the concordance between...

