Javascript must be enabled to continue!

Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

Background Social media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due to short, noisy text, overlapping symptomatology, and labeling bias. Large language models (LLMs) are increasingly used in mental health for tasks such as symptom extraction, risk screening, and triage, yet their reliability for fine-grained depression subtype classification from brief social media posts remains underexplored. Objective We benchmarked few-shot, prompt-only LLMs against parameter-efficient fine-tuned encoders for identifying depression subtypes in posts on X (formerly Twitter). Methods We used a curated dataset of 14,983 English-language tweets stratified into six clinically grounded categories: five depression subtypes (postpartum, major, bipolar, psychotic, atypical) and a no-depression class. We compared (i) instruction-tuned causal LLMs in a few-shot setting and (ii) supervised fine-tuning of transformer encoders (e.g., RoBERTa, DeBERTa, BERTweet) under identical splits and metrics. The primary evaluation metric was macro-F1 (with accuracy, precision, recall as secondary). We also report per-class precision, recall, and F1 scores, along with confusion matrices, for the best-performing model from each model family. Results Few-shot LLMs achieved macro-F1 = 0.73–0.77 (best: Llama-3-8B, accuracy 0.75). Fine-tuned encoders consistently outperformed prompt-only models, reaching macro-F1 = 0.94–0.96 (best: RoBERTa-large, accuracy 0.954). Relative improvements were largest for the clinically challenging classes. Fine-tuning increased F1 for postpartum and psychotic subtypes to ≈0.99 (substantially above few-shot) and boosted major-depression recall from ≈0.53–0.60 to ≈0.95–0.97. Error analyses showed prompt-only models frequently misclassified major and atypical depression as bipolar, patterns substantially reduced by fine-tuning. Conclusions On tweet-level depression subtyping, task-specific adaptation via fine-tuning yields substantially higher and more stable performance than few-shot prompting, particularly for nuanced, clinically anchored classes. These findings recommend fine-tuned encoders as strong, compute-efficient baselines for depression subtype classification from social media.

Frontiers Media SA

Rawan AlSaad Sulaiman Alshakhs Rajat Thomas

Frontiers in Digital Health

2026

Title: Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

Description:

Large language models (LLMs) are increasingly used in mental health for tasks such as symptom extraction, risk screening, and triage, yet their reliability for fine-grained depression subtype classification from brief social media posts remains underexplored.

Objective We benchmarked few-shot, prompt-only LLMs against parameter-efficient fine-tuned encoders for identifying depression subtypes in posts on X (formerly Twitter).

Methods We used a curated dataset of 14,983 English-language tweets stratified into six clinically grounded categories: five depression subtypes (postpartum, major, bipolar, psychotic, atypical) and a no-depression class.

We compared (i) instruction-tuned causal LLMs in a few-shot setting and (ii) supervised fine-tuning of transformer encoders (e.

, RoBERTa, DeBERTa, BERTweet) under identical splits and metrics.

The primary evaluation metric was macro-F1 (with accuracy, precision, recall as secondary).

We also report per-class precision, recall, and F1 scores, along with confusion matrices, for the best-performing model from each model family.

Results Few-shot LLMs achieved macro-F1 = 0.

73–0.

77 (best: Llama-3-8B, accuracy 0.

75).

Fine-tuned encoders consistently outperformed prompt-only models, reaching macro-F1 = 0.

94–0.

96 (best: RoBERTa-large, accuracy 0.

954).

Relative improvements were largest for the clinically challenging classes.

Fine-tuning increased F1 for postpartum and psychotic subtypes to ≈0.

99 (substantially above few-shot) and boosted major-depression recall from ≈0.

53–0.

60 to ≈0.

95–0.

97.

Error analyses showed prompt-only models frequently misclassified major and atypical depression as bipolar, patterns substantially reduced by fine-tuning.

Conclusions On tweet-level depression subtyping, task-specific adaptation via fine-tuning yields substantially higher and more stable performance than few-shot prompting, particularly for nuanced, clinically anchored classes.

These findings recommend fine-tuned encoders as strong, compute-efficient baselines for depression subtype classification from social media.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

#Ophthalmology: Popular ophthalmology hashtags as an educational source for ophthalmologists, an Instagram study

Purpose: This study aims to determine the content and intent of posts published under popular ophthalmology hashtags and to determine whether these posts were education...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Electric field tuning characteristic of multiple optical parametric oscillator based on MgO:QPLN

The quasi-phase matching optical parametric oscillator tuning methods, i.e. grating period tuning, temperature tuning, pumping wavelength tuning, and angle tuning are more simple a...

Purposes for social media content production

Informed by the uses and gratifications framework (Katz & Foulkes, 1962; Lasswell, 1948) according to which people produce and consume certain media for specific uses and becau...

Work Values

Research has identified TV series and, also more recently social media, as different actors in vocational socialization, providing individuals with career-related information (Levi...

EMNet: A Novel Few-Shot Image Classification Model with Enhanced Self-Correlation Attention and Multi-Branch Joint Module

In this research, inspired by the principles of biological visual attention mechanisms and swarm intelligence found in nature, we present an Enhanced Self-Correlation Attention and...

Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation

Large language models (LLMs) have demonstrated remarkable translation capabilities for high-resource languages, yet their effectiveness on low-resource languages under varying prom...

Email:
Password:

Email:

Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

Related Results