Javascript must be enabled to continue!
Afaan Oromo News Text Classification Using Deep Learning
View through CrossRef
Abstract
The recent development of the internet has significantly increased the availability and accessibility of Afaan Oromo texts online. Alongside the rapidly growing volume of information resources, there is a rising demand for more effective methods to find, filter, and organize these resources. Automatic text classification presents a viable solution to this challenge. Text classification, also known as text categorization, refers to the process of assigning predefined labels to text documents. This study uses deep learning algorithms with word embeddings for classifying Afaan Oromo news texts. Since feature extraction in news articles is often complex, deep learning provides a more effective approach compared to traditional methods. Earlier approaches typically relied on the bag-of-words model, which represents text as isolated words but ignores word order, an important factor in news classification. While these earlier models had relatively low time complexity, they failed to capture the context and semantic relationships between words. As the number of features and classes increased, their accuracy declined significantly. This study utilizes a dataset comprising 6,110 newly collected and annotated news articles for model training. Additionally, approximately 1,731,856 unannotated words were scraped from the Afaan Oromo news domain to develop a pre-trained word embedding model. Various natural language processing tasks, including text preprocessing steps such as normalization, tokenization, cleaning, and stop-word removal, were performed to prepare the data. For word representation, the Word2Vec embedding model, which predicts probabilistic word contexts, was selected due to its superior accuracy compared to FastText and other embedding approaches. Finally, the performance of the developed models was evaluated and compared. The CNN model achieved the highest accuracy of 98.4% and a precision of 98.4%, while the LSTM and BiLSTM models attained accuracies of 95% and 97.28%, with corresponding precisions of 94% and 97.36%, respectively.
Title: Afaan Oromo News Text Classification Using Deep Learning
Description:
Abstract
The recent development of the internet has significantly increased the availability and accessibility of Afaan Oromo texts online.
Alongside the rapidly growing volume of information resources, there is a rising demand for more effective methods to find, filter, and organize these resources.
Automatic text classification presents a viable solution to this challenge.
Text classification, also known as text categorization, refers to the process of assigning predefined labels to text documents.
This study uses deep learning algorithms with word embeddings for classifying Afaan Oromo news texts.
Since feature extraction in news articles is often complex, deep learning provides a more effective approach compared to traditional methods.
Earlier approaches typically relied on the bag-of-words model, which represents text as isolated words but ignores word order, an important factor in news classification.
While these earlier models had relatively low time complexity, they failed to capture the context and semantic relationships between words.
As the number of features and classes increased, their accuracy declined significantly.
This study utilizes a dataset comprising 6,110 newly collected and annotated news articles for model training.
Additionally, approximately 1,731,856 unannotated words were scraped from the Afaan Oromo news domain to develop a pre-trained word embedding model.
Various natural language processing tasks, including text preprocessing steps such as normalization, tokenization, cleaning, and stop-word removal, were performed to prepare the data.
For word representation, the Word2Vec embedding model, which predicts probabilistic word contexts, was selected due to its superior accuracy compared to FastText and other embedding approaches.
Finally, the performance of the developed models was evaluated and compared.
The CNN model achieved the highest accuracy of 98.
4% and a precision of 98.
4%, while the LSTM and BiLSTM models attained accuracies of 95% and 97.
28%, with corresponding precisions of 94% and 97.
36%, respectively.
Related Results
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Abstract
Classification is a technique for categorizing textual data into a form of predefined categories. Due to its major consequences in regard to critical tasks such as...
Generational Wisdom: Lesson from the Oromo People
Generational Wisdom: Lesson from the Oromo People
This review explores the foundational elements of Oromo generational wisdom, focusing on how their rich cultural heritage, particularly the Gadaa system, is passed down through gen...
Afaan Oromo Fake News Detection on Social Media: - Using Deep Learning Approach
Afaan Oromo Fake News Detection on Social Media: - Using Deep Learning Approach
Abstract
Due to the rapid growth of the internet in recent years, social media has made it easier to create and share information via computer-mediated technologies. As a r...
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Abstract
Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
“Qeerroo” (Oromo Youth) – Initiator and Active Participant in the Protest Actions (Ethiopia)
“Qeerroo” (Oromo Youth) – Initiator and Active Participant in the Protest Actions (Ethiopia)
The article for the first time in Ethiopian Studies analysis the new phenomena in Ethiopia’s politics – the role of the Oromo youth – Qeerroo – in the protests actions 2015 –2018. ...
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...

