Javascript must be enabled to continue!

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment

Abstract Purpose GPT-4, recently released by OpenAI, improves upon GPT-3.5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models. This study aims to investigate updates in GPT-4 performance vs. GPT-3.5 on Otolaryngology board-style questions. Methods 150 Otolaryngology board-style questions were obtained from the BoardVitals question bank. These questions, which were previously assessed with GPT-3.5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations. Results Standard GPT-4 correctly answered 72.0% and custom GPT-4 correctly answered 81.3% of the questions, vs. GPT-3.5 which answered 51.3% of the same questions correctly. On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.19, P = 0.015). Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard (P < 0.001). Conclusions Our study suggests that GPT-4 has higher accuracy than GPT-3.5 in answering Otolaryngology board-style questions. Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision. This demonstrates custom models may further enhance utilization of ChatGPT in medical education.

Springer Science and Business Media LLC

Lucy Revercomb Aman M. Patel Daniel Fu Andrey Filimonov

Indian Journal of Otolaryngology and Head & Neck Surgery

2024

Title: Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment

Description:

Abstract Purpose GPT-4, recently released by OpenAI, improves upon GPT-3.

5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models.

This study aims to investigate updates in GPT-4 performance vs.

GPT-3.

5 on Otolaryngology board-style questions.

Methods 150 Otolaryngology board-style questions were obtained from the BoardVitals question bank.

These questions, which were previously assessed with GPT-3.

5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations.

Results Standard GPT-4 correctly answered 72.

0% and custom GPT-4 correctly answered 81.

3% of the questions, vs.

GPT-3.

5 which answered 51.

3% of the same questions correctly.

On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.

19, P = 0.

015).

Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard (P < 0.

001).

Conclusions Our study suggests that GPT-4 has higher accuracy than GPT-3.

5 in answering Otolaryngology board-style questions.

Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision.

This demonstrates custom models may further enhance utilization of ChatGPT in medical education.

Back

Perkembangan teknologi kecerdasan buatan (Artificial Intelligence/AI), khususnya model bahasa besar seperti Generative Pre-trained Transformer (GPT), telah membawa transformasi bes...

GPT-agents based on medical guidelines can improve the responsiveness and explainability of outcomes for traumatic brain injury rehabilitation

AbstractThis study explored the application of generative pre-trained transformer (GPT) agents based on medical guidelines using large language model (LLM) technology for traumatic...

Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study (Preprint)

BACKGROUND Artificial intelligence (AI) chatbots such as ChatGPT are expected to impact vision health care significantly. Their potential to optimize the co...

Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

Abstract Background With the increasing application of large language models like ChatGPT in various industries, its potential in the medical dom...

Reporting guidelines and journal quality in otolaryngology

ObjectivesJournals increasingly use reporting guidelines to standardise research papers, partly to improve quality. Although defining journal quality is difficult, various calculat...

Multimodal Performance of GPT-4 in Complex Ophthalmology Cases

Objectives: The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requir...

Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility

Abstract This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evalua...

Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)

BACKGROUND With the increasing application of Large Language Models (LLMs) like ChatGPT in various industries, its potential in the medical domain, especial...

Email:
Password:

Email:

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment

Related Results