Javascript must be enabled to continue!
Topic Modeling for Amharic User Generated Texts
View through CrossRef
Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.
Title: Topic Modeling for Amharic User Generated Texts
Description:
Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text.
Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised.
In this work, we develop a supervised topic model for an Amharic corpus.
We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN).
We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories.
Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation.
The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.
Related Results
Developing an audio search engine for Amharic speech web resources
Developing an audio search engine for Amharic speech web resources
Abstract
While general-purpose search engines primarily serve English-language content, the web has seen enormous growth in non-resource-rich languages like Amhar...
Amharic Adhoc Information Retrieval System Based on Morphological Features
Amharic Adhoc Information Retrieval System Based on Morphological Features
Information retrieval (IR) is one of the most important research and development areas due to the explosion of digital data and the need of accessing relevant information from huge...
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Developing Amharic Sign Language Recognition Model for Amharic Characters Using Deep Learning Approach
Abstract
Hearing-impaired people use Sign Language to communicate with each other as well as with other communities. Usually, they are unable to communicate with normal peo...
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Purpose
Pelvic Floor Disorders (PFDs) affects many women and have a significant impact on their quality of life. Pelvic Floor Impact Questionnaire (PFIQ-7) and ...
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Translation, reliability, and validity of Amharic versions of the Pelvic Floor Distress Inventory (PFDI-20) and Pelvic Floor Impact Questionnaire (PFIQ-7)
Abstract
Purpose
Pelvic Floor Disorders (PFDs) affects many women and have a significant impact on their quality of life. Pelvi...
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
PRACTICALITY OF ALTERNATIVE ASSESSMENTS: FROM AMHARIC LANGUAGE INSTRUCTORS’ VIEW POINTS
The purpose of this study was examining the practicality of Alternative Assessment in Ethiopian higher education Amharic Language educational context. The study also, endeavors to ...
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)
Abstract
Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are men...
Topics (Automated Content Analysis)
Topics (Automated Content Analysis)
Topics describe the main issue discussed in an article, for example: Does an article deal with politics, economics or sports?
Field of application/theoretical foundation:
In the co...

