Javascript must be enabled to continue!
Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data
View through CrossRef
This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed using adapted pre-trained dataset. The main aim of this research was to unveil the potential benefits of using pre-trained models for handling code-mixed data to improved language understanding and context sensitivity in relation to Hausa-English-Language, the objective of this research was achieved by developing a BERT model that is capable of handling Hausa-English code-mixed dataset exploring different machine learning language models by training the chosen model with the adapted English-Hausa Language code-mixed. What necessitates this research was due to low data corpus on the language domain of Hausa-English code-mixed while other languages were explored like English-Hindu Code-Mixed. The model was developed using python transformer library. The adapted pre-trained dataset was first pre-processed, tokenized and fine-tuned in order to fit into the BERT model, the dataset was normalized in the context of code-mixed conversation based on annotate language labels to distinguish between English and Hausa Language segments in the code-mixed text, appropriate parameter for training were set with different optimization strategies for fine-tuning, adjusted learning rate, batch sizes and training epochs for performance optimization. The model was evaluated based on accuracy, F1-score, precision and recall for Code-Mixed tasks, the results of HauBERT our proposed model showed more than 90% accuracy, the result was compared with state-of-the-art BERT language models, and the study recommended that this adapted pre-trained model should be applied in large language model for language understanding and context sensitivity.
Science Publishing Group
Title: Performance Evaluation of Hybrid Bert Model on
Code-mixed for Hausa-English Using Adapted Pre-trained Data
Description:
This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed using adapted pre-trained dataset.
The main aim of this research was to unveil the potential benefits of using pre-trained models for handling code-mixed data to improved language understanding and context sensitivity in relation to Hausa-English-Language, the objective of this research was achieved by developing a BERT model that is capable of handling Hausa-English code-mixed dataset exploring different machine learning language models by training the chosen model with the adapted English-Hausa Language code-mixed.
What necessitates this research was due to low data corpus on the language domain of Hausa-English code-mixed while other languages were explored like English-Hindu Code-Mixed.
The model was developed using python transformer library.
The adapted pre-trained dataset was first pre-processed, tokenized and fine-tuned in order to fit into the BERT model, the dataset was normalized in the context of code-mixed conversation based on annotate language labels to distinguish between English and Hausa Language segments in the code-mixed text, appropriate parameter for training were set with different optimization strategies for fine-tuning, adjusted learning rate, batch sizes and training epochs for performance optimization.
The model was evaluated based on accuracy, F1-score, precision and recall for Code-Mixed tasks, the results of HauBERT our proposed model showed more than 90% accuracy, the result was compared with state-of-the-art BERT language models, and the study recommended that this adapted pre-trained model should be applied in large language model for language understanding and context sensitivity.
Related Results
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Sociolinguistic aspects of the spoken version of Hausa in Ghana
Even though Hausa is not an indigenous Ghanaian language but it plays a very important role in Ghana’s sociolinguistics. It is the lingual franca for many people living in the zong...
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
BACKGROUND
Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
Hausa Political Music and Political Engagements on Social Media during Nigeria’s 2023 General Elections
The use of Hausa political music has become a key feature during electioneering campaigns in
northern parts of Nigeria. This paper examines how Hausa political music were used for...
Quantitative Analysis of Hausa Falling Tone in the Pronunciation of Disyllabic Hausa Words Among the Yorùbá-Hausa NCE 3 Students in Primary Education Studies
Quantitative Analysis of Hausa Falling Tone in the Pronunciation of Disyllabic Hausa Words Among the Yorùbá-Hausa NCE 3 Students in Primary Education Studies
Hausa and Yorùbá languages shared two-level tones: high (ʹ) and low ( ̀ ), while a mid (-) and rising tones (˅) are peculiar to Yorùbá, with a falling tone (^) only related to ...
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
Abstract
Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...

