Javascript must be enabled to continue!

The Power of Multimodality: Comparative Analysis of Multimodal Large Language Models, Unimodal ChatGPT-5.0, and Human Clinical Experts on Wound Care Certification Examination (Preprint)

BACKGROUND Background: Multimodal large language models (MLLMs) capable of integrating visual and textual information represent a promising advancement for clinical applications requiring image interpretation. Wound care assessment, which demands simultaneous analysis of wound photographs and clinical data, provides an ideal domain to evaluate multimodal versus unimodal artificial intelligence capabilities against human expertise. Objective: To compare the performance of MLLMs, unimodal ChatGPT-5.0, and human clinical experts on a standardized wound care certification examination. OBJECTIVE Objective: To compare the performance of MLLMs, unimodal ChatGPT-5.0, and human clinical experts on a standardized wound care certification examination. METHODS Methods: This cross-sectional comparative study evaluated three participant groups on a 25-question wound care certification examination spanning four clinical domains (Diagnosis, Treatment, Complication Management, Wound Subtype Knowledge). Participants included three MLLMs (Med-PaLM 2, LLaVA-Med, BioGPT), one unimodal LLM (ChatGPT-5.0), and four human clinical experts (General Surgeon, Wound Care Nurse, two Internal Medicine Physicians). Statistical analyses included one-way ANOVA with Tukey's post-hoc tests and domain-specific Kruskal-Wallis comparisons RESULTS Results: Human experts achieved the highest accuracy (86.0%±9.1%), followed by MLLMs (78.7%±12.2%), while ChatGPT-5.0 achieved 64.0%, failing the 70% certification threshold. Significant overall group differences were observed (F(2,5)=8.42, p=0.018, η²=0.74). MLLMs significantly outperformed ChatGPT-5.0 (difference=14.7 percentage points, p=0.032, Cohen's d=1.38), with the multimodal advantage most pronounced in visually-dependent domains: Diagnosis (81% vs 43%, p=0.008) and Complication Management (72% vs 50%, p=0.034). No multimodal advantage was observed for text-based Wound Subtype Knowledge (both 67%). Med-PaLM 2 achieved 92% accuracy, matching the Wound Care Nurse, while the General Surgeon achieved the highest overall performance (96%). CONCLUSIONS Conclusions: MLLMs demonstrate significant performance advantages over unimodal AI in wound care assessment, particularly for visually-dependent clinical tasks. While human experts with specialized wound care experience maintain overall superiority, top-performing MLLMs approach expert-level accuracy, supporting their potential role as clinical decision-support tools

JMIR Publications Inc.

Mete Ucdal Melike Elif Celik Guliz Evik Saniye Beyza Kuru Saadet Ozer Sultan Gungor

2025

Title: The Power of Multimodality: Comparative Analysis of Multimodal Large Language Models, Unimodal ChatGPT-5.0, and Human Clinical Experts on Wound Care Certification Examination (Preprint)

Description:

Wound care assessment, which demands simultaneous analysis of wound photographs and clinical data, provides an ideal domain to evaluate multimodal versus unimodal artificial intelligence capabilities against human expertise.

Objective: To compare the performance of MLLMs, unimodal ChatGPT-5.

0, and human clinical experts on a standardized wound care certification examination.

OBJECTIVE Objective: To compare the performance of MLLMs, unimodal ChatGPT-5.

0, and human clinical experts on a standardized wound care certification examination.

METHODS Methods: This cross-sectional comparative study evaluated three participant groups on a 25-question wound care certification examination spanning four clinical domains (Diagnosis, Treatment, Complication Management, Wound Subtype Knowledge).

Participants included three MLLMs (Med-PaLM 2, LLaVA-Med, BioGPT), one unimodal LLM (ChatGPT-5.

0), and four human clinical experts (General Surgeon, Wound Care Nurse, two Internal Medicine Physicians).

Statistical analyses included one-way ANOVA with Tukey's post-hoc tests and domain-specific Kruskal-Wallis comparisons RESULTS Results: Human experts achieved the highest accuracy (86.

0%±9.

1%), followed by MLLMs (78.

7%±12.

2%), while ChatGPT-5.

0 achieved 64.

0%, failing the 70% certification threshold.

Significant overall group differences were observed (F(2,5)=8.

42, p=0.

018, η²=0.

74).

MLLMs significantly outperformed ChatGPT-5.

0 (difference=14.

7 percentage points, p=0.

032, Cohen's d=1.

38), with the multimodal advantage most pronounced in visually-dependent domains: Diagnosis (81% vs 43%, p=0.

008) and Complication Management (72% vs 50%, p=0.

034).

No multimodal advantage was observed for text-based Wound Subtype Knowledge (both 67%).

Med-PaLM 2 achieved 92% accuracy, matching the Wound Care Nurse, while the General Surgeon achieved the highest overall performance (96%).

CONCLUSIONS Conclusions: MLLMs demonstrate significant performance advantages over unimodal AI in wound care assessment, particularly for visually-dependent clinical tasks.

While human experts with specialized wound care experience maintain overall superiority, top-performing MLLMs approach expert-level accuracy, supporting their potential role as clinical decision-support tools.

Back

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Assessment of Chat-GPT, Gemini, and Perplexity in Principle of Research Publication: A Comparative Study

Abstract Introduction Many researchers utilize artificial intelligence (AI) to aid their research endeavors. This study seeks to assess and contrast the performance of three sophis...

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)

BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...

Primerjalna književnost na prelomu tisočletja

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study (Preprint)

BACKGROUND With the rapid advancement of artificial intelligence (AI) technologies, AI-powered chatbots, such as Chat Generative Pretrained Transformer (Cha...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Karakteristik Luka Dan Penggunaan Balutan Luka Modern

Abstract The most frequent injuries are chronic wounds, where the number is increasing every year. The most common type of wound is diabetic ulcer and the next is cancerous w...

Email:
Password:

Email:

The Power of Multimodality: Comparative Analysis of Multimodal Large Language Models, Unimodal ChatGPT-5.0, and Human Clinical Experts on Wound Care Certification Examination (Preprint)

Related Results