Javascript must be enabled to continue!

Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data

This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed using adapted pre-trained dataset. The main aim of this research was to unveil the potential benefits of using pre-trained models for handling code-mixed data to improved language understanding and context sensitivity in relation to Hausa-English-Language, the objective of this research was achieved by developing a BERT model that is capable of handling Hausa-English code-mixed dataset exploring different machine learning language models by training the chosen model with the adapted English-Hausa Language code-mixed. What necessitates this research was due to low data corpus on the language domain of Hausa-English code-mixed while other languages were explored like English-Hindu Code-Mixed. The model was developed using python transformer library. The adapted pre-trained dataset was first pre-processed, tokenized and fine-tuned in order to fit into the BERT model, the dataset was normalized in the context of code-mixed conversation based on annotate language labels to distinguish between English and Hausa Language segments in the code-mixed text, appropriate parameter for training were set with different optimization strategies for fine-tuning, adjusted learning rate, batch sizes and training epochs for performance optimization. The model was evaluated based on accuracy, F1-score, precision and recall for Code-Mixed tasks, the results of HauBERT our proposed model showed more than 90% accuracy, the result was compared with state-of-the-art BERT language models, and the study recommended that this adapted pre-trained model should be applied in large language model for language understanding and context sensitivity.

Science Publishing Group

Ali Jakwa Faseki Franscisca Abubakar Ahmad Musa Ibrahim

Science Discovery Artificial Intelligence

2026

Title: Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data

Description:

The main aim of this research was to unveil the potential benefits of using pre-trained models for handling code-mixed data to improved language understanding and context sensitivity in relation to Hausa-English-Language, the objective of this research was achieved by developing a BERT model that is capable of handling Hausa-English code-mixed dataset exploring different machine learning language models by training the chosen model with the adapted English-Hausa Language code-mixed.

What necessitates this research was due to low data corpus on the language domain of Hausa-English code-mixed while other languages were explored like English-Hindu Code-Mixed.

The model was developed using python transformer library.

The adapted pre-trained dataset was first pre-processed, tokenized and fine-tuned in order to fit into the BERT model, the dataset was normalized in the context of code-mixed conversation based on annotate language labels to distinguish between English and Hausa Language segments in the code-mixed text, appropriate parameter for training were set with different optimization strategies for fine-tuning, adjusted learning rate, batch sizes and training epochs for performance optimization.

The model was evaluated based on accuracy, F1-score, precision and recall for Code-Mixed tasks, the results of HauBERT our proposed model showed more than 90% accuracy, the result was compared with state-of-the-art BERT language models, and the study recommended that this adapted pre-trained model should be applied in large language model for language understanding and context sensitivity.

Back

Related Results

Hausa

With an estimated population of up to 50 million, Hausa make up one of the largest people groups practicing Islam. Despite settlement of today’s Hausaland in the central Sudan by t...

Aviation English - A global perspective: analysis, teaching, assessment

This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...

Hausa

The term “Hausa” refers to a language spoken by over thirty million first-language speakers living mainly in the region now comprising northern Nigeria and southern Niger, with lar...

Sociolinguistic aspects of the spoken version of Hausa in Ghana

Even though Hausa is not an indigenous Ghanaian language but it plays a very important role in Ghana’s sociolinguistics. It is the lingual franca for many people living in the zong...

Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)

BACKGROUND Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...

Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections

The use of Hausa political music has become a key feature during electioneering campaigns in northern parts of Nigeria. This paper examines how Hausa political music were used for...

Quantitative Analysis of Hausa Falling Tone in the Pronunciation of Disyllabic Hausa Words Among the Yorùbá-Hausa NCE 3 Students in Primary Education Studies

Hausa and Yorùbá languages shared two-level tones: high (ʹ) and low ( ̀ ), while a mid (-) and rising tones (˅) are peculiar to Yorùbá, with a falling tone (^) only related to ...

A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT

Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...

Email:
Password:

Email: