Javascript must be enabled to continue!

A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions

The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.

MDPI AG

Zongjing Liang Yun Kuang Xiaobo Liang Gongcheng Liang Zhijie Li

COVID

2025

Title: A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions

Description:

This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.

Centers for Disease Control and Prevention (CDC) as the evaluation criteria.

The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards.

Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability.

Then the medical topics of the generated text are analyzed using text analysis technology.

Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring.

Accuracy: domestic models have higher textual accuracy, while international models have higher reliability.

Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency.

Comprehensibility: domestic models offered superior readability, while international models were more stable in output.

Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models.

Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine.

From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper.

Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations.

In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity.

The inclusion of manual evaluations confirms the reliability of the proposed assessment framework.

It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.

Back

Related Results

KECEMASAN SAAT PANDEMI COVID 19: LITERATUR REVIEW Hardiyati, Efri Widianti, Taty Hernawaty Departemen Keperawatan Jiwa Poltekkes Kemenkes Mamuju Sulbar, Universitas Pad...

Burden of the Beast

Introduction Throughout the COVID-19 pandemic, and its fluctuating waves of infections and the emergence of new variants, Indigenous populations in Australia and worldwide have re...

Primerjalna književnost na prelomu tisočletja

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Research on the Application of Generative Artificial Intelligence to Evaluate Responses Related to Questions About COVID-19 in Terms of Their Accuracy and Readability

Objective: This study aims to compare the accuracy and readability of COVID-19 infectious disease prevention and control knowledge generated by four major generative artificial int...

Mediating effects of NLP-based parameters on the readability of crowdsourced wikipedia articles

AbstractIn this era of information and communication technology, a large population relies on the Internet to gather information. One of the most popular information sources on the...

TINGKAT KETERBACAAN BUKU TEKS BAHASA INDONESIA KURIKULUM MERDEKA UNTUK KELAS X SMA/SMK

The ever-changing curriculum means that teaching materials, mainly textbooks, also continue to change. This change is not offset by the quality of textbooks discourse, which still ...

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Audit partner attributes and key audit matters readability

PurposeThe authors examine the association between two important audit partner characteristics and the readability of key audit matters (KAMs) disclosed in the audit reports. Speci...

Email:
Password:

Email: