Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-Formal Indonesian Conversation

View through CrossRef
Abstract Background: Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root. In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level. However, there are not many stemming methods for non-formal Indonesian text processing. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. Findings: The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects. Conclusion: The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences. This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method. The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model. In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.
Title: Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-Formal Indonesian Conversation
Description:
Abstract Background: Stemming has long been used in data pre-processing to retrieve information by tracking affixed words back into their root.
In an Indonesian setting, existing stemming methods have been observed, and the existing stemming methods are proven to result in high accuracy level.
However, there are not many stemming methods for non-formal Indonesian text processing.
This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing.
Furthermore, this study aims to improve the accuracy of text classifier models by strengthening stemming method.
Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked.
The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods.
Findings: The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.
85 and 0.
73, respectively.
These results indicate that the proposed stemming methods produces a classifier model with a small error rate, so it will be more accurate to predict a class of objects.
Conclusion: The existing Indonesian stemming methods are still oriented towards Indonesian formal sentences, therefore the method has limitations to be used in Indonesian non-formal sentences.
This phenomenon underlies the suggestion of developing a corpus by normalizing Indonesian non-formal into formal to be used as a better stemming method.
The impact of using the corpus as a stemming method is that it can improve the accuracy of the classifier model.
In the future, the proposed corpus and stemming methods can be used for various purposes including text clustering, summarizing, detecting hate speech, and other text processing applications in Indonesian.

Related Results

Funkcije komunikacijski relevantne šutnje u njemačkome
Funkcije komunikacijski relevantne šutnje u njemačkome
Additionally, this chapter presents research of silence with review of main aspects of papers in the field of conversational analysis, ethnography of communication and metaphor of ...
Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-formal Indonesian Conversation
Improving the Accuracy of Text Classification using Stemming Method, A Case of Non-formal Indonesian Conversation
Abstract Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming m...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
Computer-Mediated Chat
Computer-Mediated Chat
The technical apparatus is, then, being made at home with the rest of our world. And that's a thing that's routinely being done, and it's the source of the failure of technocratic ...

Back to Top