Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Evaluation and Comparison of SVM, Deep Learning, and Naïve Bayes Performances for Natural Language Processing Text Classification Task

View through CrossRef
Text classification is one of the most important task in natural language processing, In this research, we carried out several experimental research on three (3) of the most popular Text classification NLP classifier in Convolutional Neural Network (CNN), Multinomial Naive Bayes (MNB), and Support Vector Machine (SVN). In the presence of enough training data, Deep Learning CNN works best in all parameters for evaluation with 77% accuracy, followed by SVM with accuracy of 76%, and multinomial Bayes with least performance of 69% accuracy. CNN has the best performance in the presence of large enough training dataset because of the presence of filter/ kernels which help to indentify patterns in text data regardless of their position in the sentence. We repeated the training again with just one-third of our data, at this point SVM comes with the best performance, the performance of CNN noticeably drops but still better than Multinomial Naive Bayes, the reason why SVM works best when we reduce the training data was because of its ability to look for a hyper-plane that creates a boundary between different classes of data so as to properly classify them, so we believed that getting the hyper-plane was more efficient when we reduce the dataset, hence reason for the good performance. Multinomial Naive Bayes have the least performance which we attributed to its assumption of independency between the features which sometimes does not hold true. We concluded that availability of data should be an important factor when choosing classifier for Natural Language Processing Text Classification task. CNN should be use in the presence of enough dataset, and SVM should be use when data is not enough. Multinomial Naive Bayes must not be trusted with state of the art NLP task due to its assumption of independency between the features.
Title: Evaluation and Comparison of SVM, Deep Learning, and Naïve Bayes Performances for Natural Language Processing Text Classification Task
Description:
Text classification is one of the most important task in natural language processing, In this research, we carried out several experimental research on three (3) of the most popular Text classification NLP classifier in Convolutional Neural Network (CNN), Multinomial Naive Bayes (MNB), and Support Vector Machine (SVN).
In the presence of enough training data, Deep Learning CNN works best in all parameters for evaluation with 77% accuracy, followed by SVM with accuracy of 76%, and multinomial Bayes with least performance of 69% accuracy.
CNN has the best performance in the presence of large enough training dataset because of the presence of filter/ kernels which help to indentify patterns in text data regardless of their position in the sentence.
We repeated the training again with just one-third of our data, at this point SVM comes with the best performance, the performance of CNN noticeably drops but still better than Multinomial Naive Bayes, the reason why SVM works best when we reduce the training data was because of its ability to look for a hyper-plane that creates a boundary between different classes of data so as to properly classify them, so we believed that getting the hyper-plane was more efficient when we reduce the dataset, hence reason for the good performance.
Multinomial Naive Bayes have the least performance which we attributed to its assumption of independency between the features which sometimes does not hold true.
We concluded that availability of data should be an important factor when choosing classifier for Natural Language Processing Text Classification task.
CNN should be use in the presence of enough dataset, and SVM should be use when data is not enough.
Multinomial Naive Bayes must not be trusted with state of the art NLP task due to its assumption of independency between the features.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...
Analisis Sentimen Layanan Pelanggan Provider Internet dengan Algoritma Support Vector Machine dan Naïve Bayes
Analisis Sentimen Layanan Pelanggan Provider Internet dengan Algoritma Support Vector Machine dan Naïve Bayes
Meningkatnya keluhan dan pujian pelanggan terhadap layanan internet menunjukkan pentingnya memahami opini publik secara menyeluruh. Jika hal ini tidak dimanfaatkan dengan baik, per...
Sentiment Analysis of IMDb Movie Reviews Using SVM and Naive Bayes Classifier
Sentiment Analysis of IMDb Movie Reviews Using SVM and Naive Bayes Classifier
Sentiment analysis is a powerful tool for understanding public opinion, especially in the entertainment industry. Opinion in the form of text reviews plays a significant role in th...
STUDI KLASIFIKASI TOPIK BERITA DENGAN ALGORITMA MACHINE LEARNING
STUDI KLASIFIKASI TOPIK BERITA DENGAN ALGORITMA MACHINE LEARNING
As a result of the use and access of social media, it also has an impact on increasing the amount of data and information, especially text data. Text has become one of the most nat...
Acoustic event detection and classification
Acoustic event detection and classification
L'activitat humana que té lloc en sales de reunions o aules d'ensenyament es veu reflectida en una rica varietat d'events acústics, ja siguin produïts pel cos humà o per objectes q...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
Klasifikasi Sentimen Masyarakat terhadap Presiden Indonesia Menggunakan Metode Naive Bayes
Klasifikasi Sentimen Masyarakat terhadap Presiden Indonesia Menggunakan Metode Naive Bayes
Abstract. Social media platform X has become an important platform for expressing public opinion, particularly in the political context, including the 2024 Presidential Election in...

Back to Top