Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Hybrid Lexicon–Transformer Framework for Sentiment, Emotion, and Context Classification in Moroccan Darija (TriLex-Darija)

View through CrossRef
This paper introduces TriLex-Darija , a large-scale affective lexicon suite and a hybrid lexicon–transformer framework for analyzing Moroccan Arabic (Darija) social media text across three complementary dimensions: sentiment, emotion, and pragmatic context. The resource is constructed from a corpus of 288,709 manually annotated comments and consists of three unigram lexicons, each mapping 147,565 words to normalized probability distributions over task-specific labels. We first evaluate a symbolic lexicon-based classifier (without machine learning) based on word-level score aggregation to assess the intrinsic quality of the proposed TriLex-Darija resource. Despite the absence of contextual modeling, this approach achieves competitive performance, demonstrating that corpus-derived lexical knowledge captures substantial affective information in Moroccan Darija. To further improve performance, we propose a unified hybrid framework that combines TriLex-Darija features with contextual embeddings extracted from MARBERT. All models are trained using a consistent LinearSVC classifier to ensure fair comparison and reproducibility. In addition to the symbolic model, we evaluate a lexicon-feature-based LinearSVC model, allowing a clear distinction between symbolic, feature-based, and hybrid approaches. Experimental results show that the hybrid model consistently outperforms both BERT-only and lexicon-feature-based baselines across all tasks. For sentiment classification, the hybrid model achieves a macro F1-score of 72.96% , compared to 59.15% for BERT-only and 67.94% for the lexicon-feature-based model. For emotion classification, it reaches 92.92% , outperforming BERT-only ( 79.17% ) and lexicon-feature-based ( 89.55% ) models. For pragmatic context classification, the hybrid model achieves 91.35% , compared to 75.93% for BERT-only and 87.72% for the lexicon-feature-based model. Bootstrap confidence intervals (95%) and McNemar’s tests confirm that all improvements are statistically significant ( p < 0.001). Overall, TriLex-Darija demonstrates that combining lexical knowledge with contextual embeddings leads to robust, interpretable, and statistically validated affective models for Moroccan Darija in low-resource settings.
Title: A Hybrid Lexicon–Transformer Framework for Sentiment, Emotion, and Context Classification in Moroccan Darija (TriLex-Darija)
Description:
This paper introduces TriLex-Darija , a large-scale affective lexicon suite and a hybrid lexicon–transformer framework for analyzing Moroccan Arabic (Darija) social media text across three complementary dimensions: sentiment, emotion, and pragmatic context.
The resource is constructed from a corpus of 288,709 manually annotated comments and consists of three unigram lexicons, each mapping 147,565 words to normalized probability distributions over task-specific labels.
We first evaluate a symbolic lexicon-based classifier (without machine learning) based on word-level score aggregation to assess the intrinsic quality of the proposed TriLex-Darija resource.
Despite the absence of contextual modeling, this approach achieves competitive performance, demonstrating that corpus-derived lexical knowledge captures substantial affective information in Moroccan Darija.
To further improve performance, we propose a unified hybrid framework that combines TriLex-Darija features with contextual embeddings extracted from MARBERT.
All models are trained using a consistent LinearSVC classifier to ensure fair comparison and reproducibility.
In addition to the symbolic model, we evaluate a lexicon-feature-based LinearSVC model, allowing a clear distinction between symbolic, feature-based, and hybrid approaches.
Experimental results show that the hybrid model consistently outperforms both BERT-only and lexicon-feature-based baselines across all tasks.
For sentiment classification, the hybrid model achieves a macro F1-score of 72.
96% , compared to 59.
15% for BERT-only and 67.
94% for the lexicon-feature-based model.
For emotion classification, it reaches 92.
92% , outperforming BERT-only ( 79.
17% ) and lexicon-feature-based ( 89.
55% ) models.
For pragmatic context classification, the hybrid model achieves 91.
35% , compared to 75.
93% for BERT-only and 87.
72% for the lexicon-feature-based model.
Bootstrap confidence intervals (95%) and McNemar’s tests confirm that all improvements are statistically significant ( p < 0.
001).
Overall, TriLex-Darija demonstrates that combining lexical knowledge with contextual embeddings leads to robust, interpretable, and statistically validated affective models for Moroccan Darija in low-resource settings.

Related Results

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
MDVC corpus: empowering Moroccan Darija speech recognition
MDVC corpus: empowering Moroccan Darija speech recognition
Automatic speech recognition (ASR) technology has significantly transformed human-machine interactions, but it remains limited in its representation of diverse languages and dialec...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...
Learning Domain-specific Sentiment Lexicon with Supervised Sentiment-aware LDA
Learning Domain-specific Sentiment Lexicon with Supervised Sentiment-aware LDA
Analyzing and understanding people's sentiments towards different topics has become an interesting task due to the explosion of opinion-rich resources. In most sentiment analysis a...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
Arabic Darija dialect on the YouTube account of Aisha Devia official: A sociolinguistic approach
Arabic Darija dialect on the YouTube account of Aisha Devia official: A sociolinguistic approach
This study aims to explain the factors behind the emergence of the Darija dialect in Morocco and to describe the types of Moroccan dialects, especially on Aisha Devi's Official You...

Back to Top