Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Curation of a polysemous word dataset for word sense disambiguation in Hausa language

View through CrossRef
The challenge of Word Sense Disambiguation (WSD) is fundamental to Natural Language Processing (NLP), particularly in low-resource languages where lexical ambiguity hinders effective language understanding. Hausa, a major Chadic language spoken by over 60 million people, lacks structured lexical resources for disambiguating polysemous words. This paper presents the development and curation of a high-quality Hausa Polysemous Word Sense Disambiguation dataset consisting of 2,021 manually selected and annotated lemmas. Each lemma is disambiguated into its distinct senses, accompanied by contextual Hausa example sentences, English glosses, and translations. The dataset is designed to support the training and evaluation of supervised and semi-supervised WSD models for Hausa and serves as a foundational resource for semantic NLP tasks in low-resource settings. The annotation schema, curation methodology, and linguistic validation process are described in detail. This work fills a critical gap in Hausa NLP and provides a reproducible framework for constructing sense-annotated corpora in other under-resourced languages.
Title: Curation of a polysemous word dataset for word sense disambiguation in Hausa language
Description:
The challenge of Word Sense Disambiguation (WSD) is fundamental to Natural Language Processing (NLP), particularly in low-resource languages where lexical ambiguity hinders effective language understanding.
Hausa, a major Chadic language spoken by over 60 million people, lacks structured lexical resources for disambiguating polysemous words.
This paper presents the development and curation of a high-quality Hausa Polysemous Word Sense Disambiguation dataset consisting of 2,021 manually selected and annotated lemmas.
Each lemma is disambiguated into its distinct senses, accompanied by contextual Hausa example sentences, English glosses, and translations.
The dataset is designed to support the training and evaluation of supervised and semi-supervised WSD models for Hausa and serves as a foundational resource for semantic NLP tasks in low-resource settings.
The annotation schema, curation methodology, and linguistic validation process are described in detail.
This work fills a critical gap in Hausa NLP and provides a reproducible framework for constructing sense-annotated corpora in other under-resourced languages.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Hausa
Hausa
With an estimated population of up to 50 million, Hausa make up one of the largest people groups practicing Islam. Despite settlement of today’s Hausaland in the central Sudan by t...
Hausa
Hausa
The term “Hausa” refers to a language spoken by over thirty million first-language speakers living mainly in the region now comprising northern Nigeria and southern Niger, with lar...
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Even though Hausa is not an indigenous Ghanaian language but it plays a very important role in Ghana’s sociolinguistics. It is the lingual franca for many people living in the zong...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
The use of Hausa political music has become a key feature during electioneering campaigns in northern parts of Nigeria. This paper examines how Hausa political music were used for...
Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data
Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data
This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed us...

Back to Top