Javascript must be enabled to continue!
Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
View through CrossRef
Abstract
Purpose
GPT-4, recently released by OpenAI, improves upon GPT-3.5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models. This study aims to investigate updates in GPT-4 performance vs. GPT-3.5 on Otolaryngology board-style questions.
Methods
150 Otolaryngology board-style questions were obtained from the BoardVitals question bank. These questions, which were previously assessed with GPT-3.5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations.
Results
Standard GPT-4 correctly answered 72.0% and custom GPT-4 correctly answered 81.3% of the questions, vs. GPT-3.5 which answered 51.3% of the same questions correctly. On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.19, P = 0.015). Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard (P < 0.001).
Conclusions
Our study suggests that GPT-4 has higher accuracy than GPT-3.5 in answering Otolaryngology board-style questions. Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision. This demonstrates custom models may further enhance utilization of ChatGPT in medical education.
Springer Science and Business Media LLC
Title: Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
Description:
Abstract
Purpose
GPT-4, recently released by OpenAI, improves upon GPT-3.
5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models.
This study aims to investigate updates in GPT-4 performance vs.
GPT-3.
5 on Otolaryngology board-style questions.
Methods
150 Otolaryngology board-style questions were obtained from the BoardVitals question bank.
These questions, which were previously assessed with GPT-3.
5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations.
Results
Standard GPT-4 correctly answered 72.
0% and custom GPT-4 correctly answered 81.
3% of the questions, vs.
GPT-3.
5 which answered 51.
3% of the same questions correctly.
On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.
19, P = 0.
015).
Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard (P < 0.
001).
Conclusions
Our study suggests that GPT-4 has higher accuracy than GPT-3.
5 in answering Otolaryngology board-style questions.
Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision.
This demonstrates custom models may further enhance utilization of ChatGPT in medical education.
Related Results
Analisis Penggunaan GPT dalam Pembelajaran Klinik Optik I di ARO Gapopin
Analisis Penggunaan GPT dalam Pembelajaran Klinik Optik I di ARO Gapopin
Perkembangan teknologi kecerdasan buatan (Artificial Intelligence/AI), khususnya model bahasa besar seperti Generative Pre-trained Transformer (GPT), telah membawa transformasi bes...
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation
AbstractThis study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic...
Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)
Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)
BACKGROUND
Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the co...
Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study
Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study
Abstract
Background
With the increasing application of large language models like ChatGPT in various industries, its potential in the medical dom...
Reporting guidelines and journal quality in otolaryngology
Reporting guidelines and journal quality in otolaryngology
ObjectivesJournals increasingly use reporting guidelines to standardise research papers, partly to improve quality. Although defining journal quality is difficult, various calculat...
Multimodal Performance of GPT-4 in Complex Ophthalmology Cases
Multimodal Performance of GPT-4 in Complex Ophthalmology Cases
Objectives: The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requir...
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility
Abstract
This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evalua...
Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)
Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)
BACKGROUND
With the increasing application of Large Language Models (LLMs) like ChatGPT in various industries, its potential in the medical domain, especial...

