Javascript must be enabled to continue!
Curation of a polysemous word dataset for word sense disambiguation in Hausa language
View through CrossRef
The challenge of Word Sense Disambiguation (WSD) is fundamental to Natural Language Processing (NLP), particularly in low-resource languages where lexical ambiguity hinders effective language understanding. Hausa, a major Chadic language spoken by over 60 million people, lacks structured lexical resources for disambiguating polysemous words. This paper presents the development and curation of a high-quality Hausa Polysemous Word Sense Disambiguation dataset consisting of 2,021 manually selected and annotated lemmas. Each lemma is disambiguated into its distinct senses, accompanied by contextual Hausa example sentences, English glosses, and translations. The dataset is designed to support the training and evaluation of supervised and semi-supervised WSD models for Hausa and serves as a foundational resource for semantic NLP tasks in low-resource settings. The annotation schema, curation methodology, and linguistic validation process are described in detail. This work fills a critical gap in Hausa NLP and provides a reproducible framework for constructing sense-annotated corpora in other under-resourced languages.
Journal of Statistical Sciences and Computational Intelligence
Title: Curation of a polysemous word dataset for word sense disambiguation in Hausa language
Description:
The challenge of Word Sense Disambiguation (WSD) is fundamental to Natural Language Processing (NLP), particularly in low-resource languages where lexical ambiguity hinders effective language understanding.
Hausa, a major Chadic language spoken by over 60 million people, lacks structured lexical resources for disambiguating polysemous words.
This paper presents the development and curation of a high-quality Hausa Polysemous Word Sense Disambiguation dataset consisting of 2,021 manually selected and annotated lemmas.
Each lemma is disambiguated into its distinct senses, accompanied by contextual Hausa example sentences, English glosses, and translations.
The dataset is designed to support the training and evaluation of supervised and semi-supervised WSD models for Hausa and serves as a foundational resource for semantic NLP tasks in low-resource settings.
The annotation schema, curation methodology, and linguistic validation process are described in detail.
This work fills a critical gap in Hausa NLP and provides a reproducible framework for constructing sense-annotated corpora in other under-resourced languages.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Even though Hausa is not an indigenous Ghanaian language but it plays a very important role in Ghana’s sociolinguistics. It is the lingual franca for many people living in the zong...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
Quantitative Analysis of Hausa Falling Tone in the Pronunciation of Disyllabic Hausa Words Among the Yorùbá-Hausa NCE 3 Students in Primary Education Studies
Quantitative Analysis of Hausa Falling Tone in the Pronunciation of Disyllabic Hausa Words Among the Yorùbá-Hausa NCE 3 Students in Primary Education Studies
Hausa and Yorùbá languages shared two-level tones: high (ʹ) and low ( ̀ ), while a mid (-) and rising tones (˅) are peculiar to Yorùbá, with a falling tone (^) only related to ...
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
The use of Hausa political music has become a key feature during electioneering campaigns in
northern parts of Nigeria. This paper examines how Hausa political music were used for...
Performance Evaluation of Hybrid Bert Model on
Code-mixed for Hausa-English Using Adapted Pre-trained Data
Performance Evaluation of Hybrid Bert Model on
Code-mixed for Hausa-English Using Adapted Pre-trained Data
This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed us...

