Javascript must be enabled to continue!

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study

Background The use of web-based search and social media can help identify epidemics, potentially earlier than clinical methods or even potentially identifying unreported outbreaks. Monitoring for eye-related epidemics, such as conjunctivitis outbreaks, can facilitate early public health intervention to reduce transmission and ocular comorbidities. However, monitoring social media content for conjunctivitis outbreaks is costly and laborious. Large language models (LLMs) could overcome these barriers by assessing the likelihood that real-world outbreaks are being described. However, public health actions for likely outbreaks could benefit more by knowing additional epidemiological characteristics, such as outbreak type, size, and severity. Objective We aimed to assess whether and how well LLMs can classify epidemiological features from social media posts beyond conjunctivitis outbreak probability, including outbreak type, size, severity, etiology, and community setting. We used a validation framework comparing LLM classifications to those of other LLMs and human experts. Methods We wrote code to generate synthetic conjunctivitis outbreak social media posts, embedded with specific preclassified epidemiological features to simulate various infectious eye disease outbreak and control scenarios. We used these posts to develop effective LLM prompts and test the capabilities of multiple LLMs. For top-performing LLMs, we gauged their practical utility in real-world epidemiological surveillance by comparing their assessments of Twitter/X, forum, and YouTube conjunctivitis posts. Finally, human raters also classified the posts, and we compared their classifications to those of a leading LLM for validation. Comparisons entailed correlation or sensitivity and specificity statistics. Results We assessed 7 LLMs for effectively classifying epidemiological data from 1152 synthetic posts, 370 Twitter/X posts, 290 forum posts, and 956 YouTube posts. Despite some discrepancies, the LLMs demonstrated a reliable capacity for nuanced epidemiological analysis across various data sources and compared to humans or between LLMs. Notably, GPT-4 and Mixtral 8x22b exhibited high performance, predicting conjunctivitis outbreak characteristics such as probability (GPT-4: correlation=0.73), size (Mixtral 8x22b: correlation=0.82), and type (infectious, allergic, or environmentally caused); however, there were notable exceptions. Assessing synthetic and real-world posts for etiological factors, infectious eye disease specialist validations revealed that GPT-4 had high specificity (0.83-1.00) but variable sensitivity (0.32-0.71). Interrater reliability analyses showed that LLM-expert agreement exceeded expert-expert agreement for severity assessment (intraclass correlation coefficient=0.69 vs 0.38), while agreement varied by condition type (κ=0.37-0.94). Conclusions This investigation into the potential of LLMs for public health infoveillance suggests effectiveness in classifying key epidemiological characteristics from social media content about conjunctivitis outbreaks. Future studies should further explore LLMs’ potential to support public health monitoring through the automated assessment and classification of potential infectious eye disease or other outbreaks. Their optimal role may be to act as a first line of documentation, alerting public health organizations for the follow-up of LLM-detected and -classified small, early outbreaks, with a focus on the most severe ones.

JMIR Publications Inc.

Michael S Deiner Russell Y Deiner Cherie Fathy Natalie A Deiner Vagelis Hristidis Stephen D McLeod Thomas J Bukowski Thuy Doan Gerami D Seitzman Thomas M Lietman Travis C Porco

Journal of Medical Internet Research

2025

Title: Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study

Description:

Background The use of web-based search and social media can help identify epidemics, potentially earlier than clinical methods or even potentially identifying unreported outbreaks.

Monitoring for eye-related epidemics, such as conjunctivitis outbreaks, can facilitate early public health intervention to reduce transmission and ocular comorbidities.

However, monitoring social media content for conjunctivitis outbreaks is costly and laborious.

Large language models (LLMs) could overcome these barriers by assessing the likelihood that real-world outbreaks are being described.

However, public health actions for likely outbreaks could benefit more by knowing additional epidemiological characteristics, such as outbreak type, size, and severity.

Objective We aimed to assess whether and how well LLMs can classify epidemiological features from social media posts beyond conjunctivitis outbreak probability, including outbreak type, size, severity, etiology, and community setting.

We used a validation framework comparing LLM classifications to those of other LLMs and human experts.

Methods We wrote code to generate synthetic conjunctivitis outbreak social media posts, embedded with specific preclassified epidemiological features to simulate various infectious eye disease outbreak and control scenarios.

We used these posts to develop effective LLM prompts and test the capabilities of multiple LLMs.

For top-performing LLMs, we gauged their practical utility in real-world epidemiological surveillance by comparing their assessments of Twitter/X, forum, and YouTube conjunctivitis posts.

Finally, human raters also classified the posts, and we compared their classifications to those of a leading LLM for validation.

Comparisons entailed correlation or sensitivity and specificity statistics.

Results We assessed 7 LLMs for effectively classifying epidemiological data from 1152 synthetic posts, 370 Twitter/X posts, 290 forum posts, and 956 YouTube posts.

Despite some discrepancies, the LLMs demonstrated a reliable capacity for nuanced epidemiological analysis across various data sources and compared to humans or between LLMs.

Notably, GPT-4 and Mixtral 8x22b exhibited high performance, predicting conjunctivitis outbreak characteristics such as probability (GPT-4: correlation=0.

73), size (Mixtral 8x22b: correlation=0.

82), and type (infectious, allergic, or environmentally caused); however, there were notable exceptions.

Assessing synthetic and real-world posts for etiological factors, infectious eye disease specialist validations revealed that GPT-4 had high specificity (0.

83-1.

00) but variable sensitivity (0.

32-0.

71).

Interrater reliability analyses showed that LLM-expert agreement exceeded expert-expert agreement for severity assessment (intraclass correlation coefficient=0.

69 vs 0.

38), while agreement varied by condition type (κ=0.

37-0.

94).

Conclusions This investigation into the potential of LLMs for public health infoveillance suggests effectiveness in classifying key epidemiological characteristics from social media content about conjunctivitis outbreaks.

Future studies should further explore LLMs’ potential to support public health monitoring through the automated assessment and classification of potential infectious eye disease or other outbreaks.

Their optimal role may be to act as a first line of documentation, alerting public health organizations for the follow-up of LLM-detected and -classified small, early outbreaks, with a focus on the most severe ones.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study (Preprint)

BACKGROUND The use of web-based search and social media can help identify epidemics, potentially earlier than clinical methods or even potentially identifyi...

The ocular crisis: Conjunctivitis in Karachi

Dear Editor, Conjunctivitis, commonly referred to as Pink Eye, is an eye condition characterized by inflammation of the conjunctiva, leading to a pink or red discoloration due to d...

Spatial and temporal trends of conjunctivitis in Uganda, 2020–2023: A nationwide surveillance analysis

Introduction: Conjunctivitis is a common ocular condition with multiple infectious and non-infectious causes and remains an important public health concern in low-resource settings...

#Ophthalmology: Popular ophthalmology hashtags as an educational source for ophthalmologists, an Instagram study

Purpose: This study aims to determine the content and intent of posts published under popular ophthalmology hashtags and to determine whether these posts were education...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Alts and Automediality: Compartmentalising the Self through Multiple Social Media Profiles

IntroductionAlt, or alternative, accounts are secondary profiles people use in addition to a main account on a social media platform. They are a kind of automediation, a way of rep...

Purposes for social media content production

Informed by the uses and gratifications framework (Katz & Foulkes, 1962; Lasswell, 1948) according to which people produce and consume certain media for specific uses and becau...

Email:
Password:

Email:

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study

Related Results