Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)

View through CrossRef
BACKGROUND With the increasing application of Large Language Models (LLMs) like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. OBJECTIVE To assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). METHODS The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical sub-specialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompts designation of system roles tailored to medical sub-specialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and Kappa values were employed to evaluate the model's accuracy and consistency. RESULTS GPT-4.0 achieved passing accuracy of (71.0% - 74.7%), significantly higher than that of GPT-3.5 (50.3% - 54.8%, P < 0.001). Both models showed relatively high coherence between initial and 2nd response, with Kappa values of 0.778 and 0.610. System roles boosted accuracy for both GPT-4.0 (0.3% - 3.7%) and GPT-3.5 (1.3% - 4.5%), and increased the Kappa by 0.023 and 0.035 respectively. In multi-specialty analysis, GPT-4.0 passed the threshold in 14 of 15 sub-specialties, while GPT-3.5 did so in 7 of 15 on the first response. CONCLUSIONS GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical sub-specialty expertise. Adding a system role enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study.
Title: Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)
Description:
BACKGROUND With the increasing application of Large Language Models (LLMs) like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research.
OBJECTIVE To assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE).
METHODS The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical sub-specialties.
Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023.
Three key factors were considered: the version of GPT-3.
5 and 4.
0, the prompts designation of system roles tailored to medical sub-specialties, and repetition for coherence.
A passing accuracy threshold was established as 60%.
The χ2 tests and Kappa values were employed to evaluate the model's accuracy and consistency.
RESULTS GPT-4.
0 achieved passing accuracy of (71.
0% - 74.
7%), significantly higher than that of GPT-3.
5 (50.
3% - 54.
8%, P < 0.
001).
Both models showed relatively high coherence between initial and 2nd response, with Kappa values of 0.
778 and 0.
610.
System roles boosted accuracy for both GPT-4.
0 (0.
3% - 3.
7%) and GPT-3.
5 (1.
3% - 4.
5%), and increased the Kappa by 0.
023 and 0.
035 respectively.
In multi-specialty analysis, GPT-4.
0 passed the threshold in 14 of 15 sub-specialties, while GPT-3.
5 did so in 7 of 15 on the first response.
CONCLUSIONS GPT-4.
0 passed the CNMLE and outperformed GPT-3.
5 in key areas such as accuracy, consistency, and medical sub-specialty expertise.
Adding a system role enhanced the model's reliability and answer coherence.
GPT-4.
0 showed promising potential in medical education and clinical practice, meriting further study.

Related Results

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study
Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Unlocking Educational Potential: Exploring Students’ Satisfaction and Sustainable Engagement with ChatGPT Using the ECM Model
Aim/Purpose: The main goal of this study is to investigate the factors affecting students’ satisfaction and continuous usage of ChatGPT in an educational context, using the Expecta...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
ChatGPT's Capabilities for Use in Anatomy Education and Anatomy Research
Dear Editors, Recently, the discussion of an artificial intelligence (AI) - fueled platform in several articles in your journal has attracted the attention of many researchers [1, ...
ChatGPT and Medical Education: A Double-Edged Sword
ChatGPT and Medical Education: A Double-Edged Sword
ChatGPT has gained attention worldwide. In the medical education field, ChatGPT, or any similar large language model, provides a convenient way for students to access information a...
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions (Preprint)
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions (Preprint)
BACKGROUND Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information ...

Back to Top