Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions

View through CrossRef
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.
Title: A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions
Description:
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models.
This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.
S.
Centers for Disease Control and Prevention (CDC) as the evaluation criteria.
The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards.
Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability.
Then the medical topics of the generated text are analyzed using text analysis technology.
Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring.
Accuracy: domestic models have higher textual accuracy, while international models have higher reliability.
Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency.
Comprehensibility: domestic models offered superior readability, while international models were more stable in output.
Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models.
Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine.
From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper.
Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations.
In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity.
The inclusion of manual evaluations confirms the reliability of the proposed assessment framework.
It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks.

Related Results

KECEMASAN SAAT PANDEMI COVID 19: LITERATUR REVIEW Hardiyati, Efri Widianti, Taty Hernawaty Departemen Keperawatan Jiwa Poltekkes Kemenkes Mamuju Sulbar, Universitas Pad...
Burden of the Beast
Burden of the Beast
Introduction Throughout the COVID-19 pandemic, and its fluctuating waves of infections and the emergence of new variants, Indigenous populations in Australia and worldwide have re...
Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Mediating effects of NLP-based parameters on the readability of crowdsourced wikipedia articles
Mediating effects of NLP-based parameters on the readability of crowdsourced wikipedia articles
AbstractIn this era of information and communication technology, a large population relies on the Internet to gather information. One of the most popular information sources on the...
TINGKAT KETERBACAAN BUKU TEKS BAHASA INDONESIA KURIKULUM MERDEKA UNTUK KELAS X SMA/SMK
TINGKAT KETERBACAAN BUKU TEKS BAHASA INDONESIA KURIKULUM MERDEKA UNTUK KELAS X SMA/SMK
The ever-changing curriculum means that teaching materials, mainly textbooks, also continue to change. This change is not offset by the quality of textbooks discourse, which still ...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Audit partner attributes and key audit matters readability
Audit partner attributes and key audit matters readability
PurposeThe authors examine the association between two important audit partner characteristics and the readability of key audit matters (KAMs) disclosed in the audit reports. Speci...

Back to Top