Javascript must be enabled to continue!
A Hybrid Lexicon–Transformer Framework for Sentiment, Emotion, and Context Classification in Moroccan Darija (TriLex-Darija)
View through CrossRef
This paper introduces
TriLex-Darija
, a large-scale affective lexicon suite and a hybrid lexicon–transformer framework for analyzing Moroccan Arabic (Darija) social media text across three complementary dimensions: sentiment, emotion, and pragmatic context. The resource is constructed from a corpus of 288,709 manually annotated comments and consists of three unigram lexicons, each mapping 147,565 words to normalized probability distributions over task-specific labels.
We first evaluate a symbolic lexicon-based classifier (without machine learning) based on word-level score aggregation to assess the intrinsic quality of the proposed TriLex-Darija resource. Despite the absence of contextual modeling, this approach achieves competitive performance, demonstrating that corpus-derived lexical knowledge captures substantial affective information in Moroccan Darija.
To further improve performance, we propose a unified hybrid framework that combines TriLex-Darija features with contextual embeddings extracted from MARBERT. All models are trained using a consistent LinearSVC classifier to ensure fair comparison and reproducibility. In addition to the symbolic model, we evaluate a lexicon-feature-based LinearSVC model, allowing a clear distinction between symbolic, feature-based, and hybrid approaches.
Experimental results show that the hybrid model consistently outperforms both BERT-only and lexicon-feature-based baselines across all tasks. For sentiment classification, the hybrid model achieves a macro F1-score of
72.96%
, compared to
59.15%
for BERT-only and
67.94%
for the lexicon-feature-based model. For emotion classification, it reaches
92.92%
, outperforming BERT-only (
79.17%
) and lexicon-feature-based (
89.55%
) models. For pragmatic context classification, the hybrid model achieves
91.35%
, compared to
75.93%
for BERT-only and
87.72%
for the lexicon-feature-based model.
Bootstrap confidence intervals (95%) and McNemar’s tests confirm that all improvements are statistically significant (
p
< 0.001). Overall, TriLex-Darija demonstrates that combining lexical knowledge with contextual embeddings leads to robust, interpretable, and statistically validated affective models for Moroccan Darija in low-resource settings.
Association for Computing Machinery (ACM)
Title: A Hybrid Lexicon–Transformer Framework for Sentiment, Emotion, and Context Classification in Moroccan Darija (TriLex-Darija)
Description:
This paper introduces
TriLex-Darija
, a large-scale affective lexicon suite and a hybrid lexicon–transformer framework for analyzing Moroccan Arabic (Darija) social media text across three complementary dimensions: sentiment, emotion, and pragmatic context.
The resource is constructed from a corpus of 288,709 manually annotated comments and consists of three unigram lexicons, each mapping 147,565 words to normalized probability distributions over task-specific labels.
We first evaluate a symbolic lexicon-based classifier (without machine learning) based on word-level score aggregation to assess the intrinsic quality of the proposed TriLex-Darija resource.
Despite the absence of contextual modeling, this approach achieves competitive performance, demonstrating that corpus-derived lexical knowledge captures substantial affective information in Moroccan Darija.
To further improve performance, we propose a unified hybrid framework that combines TriLex-Darija features with contextual embeddings extracted from MARBERT.
All models are trained using a consistent LinearSVC classifier to ensure fair comparison and reproducibility.
In addition to the symbolic model, we evaluate a lexicon-feature-based LinearSVC model, allowing a clear distinction between symbolic, feature-based, and hybrid approaches.
Experimental results show that the hybrid model consistently outperforms both BERT-only and lexicon-feature-based baselines across all tasks.
For sentiment classification, the hybrid model achieves a macro F1-score of
72.
96%
, compared to
59.
15%
for BERT-only and
67.
94%
for the lexicon-feature-based model.
For emotion classification, it reaches
92.
92%
, outperforming BERT-only (
79.
17%
) and lexicon-feature-based (
89.
55%
) models.
For pragmatic context classification, the hybrid model achieves
91.
35%
, compared to
75.
93%
for BERT-only and
87.
72%
for the lexicon-feature-based model.
Bootstrap confidence intervals (95%) and McNemar’s tests confirm that all improvements are statistically significant (
p
< 0.
001).
Overall, TriLex-Darija demonstrates that combining lexical knowledge with contextual embeddings leads to robust, interpretable, and statistically validated affective models for Moroccan Darija in low-resource settings.
Related Results
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND
Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
MDVC corpus: empowering Moroccan Darija speech recognition
MDVC corpus: empowering Moroccan Darija speech recognition
Automatic speech recognition (ASR) technology has significantly transformed human-machine interactions, but it remains limited in its representation of diverse languages and dialec...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...
The Implications of Spanish-Moroccan Governmental Relations for Moroccan Immigrants in Spain Spanish-Moroccan Governmental Relations and Moroccan Immigrants
The Implications of Spanish-Moroccan Governmental Relations for Moroccan Immigrants in Spain Spanish-Moroccan Governmental Relations and Moroccan Immigrants
AbstractThe terrorist attacks in Madrid on March 11, 2004 were one of the most traumatic events in recent Spanish domestic history, and have had a profound influence in internal po...
Learning Domain-specific Sentiment Lexicon with Supervised Sentiment-aware LDA
Learning Domain-specific Sentiment Lexicon with Supervised Sentiment-aware LDA
Analyzing and understanding people's sentiments towards different topics has become an interesting task due to the explosion of opinion-rich resources. In most sentiment analysis a...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
Arabic Darija dialect on the YouTube account of Aisha Devia official: A sociolinguistic approach
Arabic Darija dialect on the YouTube account of Aisha Devia official: A sociolinguistic approach
This study aims to explain the factors behind the emergence of the Darija dialect in Morocco and to describe the types of Moroccan dialects, especially on Aisha Devi's Official You...

