Javascript must be enabled to continue!

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have proved superior performance over statistical- and lexical-based approaches in NLP-related tasks. Word embedding is an important layer of deep learning models to generate input features. Many word embedding models have been presented for text representation of both classic and context-based word embeddings. In this paper, we present a comparative analysis to evaluate both classic and contextualized word embeddings for sentiment analysis. The four most frequently used word embedding techniques were used in their trained and pre-trained versions. The selected embedding represents classical and contextualized techniques. Classical word embedding includes algorithms such as GloVe, Word2vec, and FastText. By contrast, ARBERT is used as a contextualized embedding model. Since word embedding is more typically employed as the input layer in deep networks, we used deep learning architectures BiLSTM and CNN for sentiment classification. To achieve these goals, the experiments were applied to a series of benchmark datasets: HARD, Khooli, AJGT, ArSAS, and ASTD. Finally, a comparative analysis was conducted on the results obtained for the experimented models. Our outcomes indicate that, generally, generated embedding by one technique achieves higher performance than its pretrained version for the same technique by around 0.28 to 1.8% accuracy, 0.33 to 2.17% precision, and 0.44 to 2% recall. Moreover, the contextualized transformer-based embedding model BERT achieved the highest performance in its pretrained and trained versions. Additionally, the results indicate that BiLSTM outperforms CNN by approximately 2% in 3 datasets, HARD, Khooli, and ArSAS, while CNN achieved around 2% higher performance in the smaller datasets, AJGT and ASTD.

MDPI AG

Sahar F. Sabbeh Heba A. Fasihuddin

Electronics

2023

Title: A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

Description:

Sentiment analysis on social media platforms (i.

, Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences.

However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP).

Recently, deep learning models have proved superior performance over statistical- and lexical-based approaches in NLP-related tasks.

Word embedding is an important layer of deep learning models to generate input features.

Many word embedding models have been presented for text representation of both classic and context-based word embeddings.

In this paper, we present a comparative analysis to evaluate both classic and contextualized word embeddings for sentiment analysis.

The four most frequently used word embedding techniques were used in their trained and pre-trained versions.

The selected embedding represents classical and contextualized techniques.

Classical word embedding includes algorithms such as GloVe, Word2vec, and FastText.

By contrast, ARBERT is used as a contextualized embedding model.

Since word embedding is more typically employed as the input layer in deep networks, we used deep learning architectures BiLSTM and CNN for sentiment classification.

To achieve these goals, the experiments were applied to a series of benchmark datasets: HARD, Khooli, AJGT, ArSAS, and ASTD.

Finally, a comparative analysis was conducted on the results obtained for the experimented models.

Our outcomes indicate that, generally, generated embedding by one technique achieves higher performance than its pretrained version for the same technique by around 0.

28 to 1.

8% accuracy, 0.

33 to 2.

17% precision, and 0.

44 to 2% recall.

Moreover, the contextualized transformer-based embedding model BERT achieved the highest performance in its pretrained and trained versions.

Additionally, the results indicate that BiLSTM outperforms CNN by approximately 2% in 3 datasets, HARD, Khooli, and ArSAS, while CNN achieved around 2% higher performance in the smaller datasets, AJGT and ASTD.

Back

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Sentiment/tone (Automated Content Analysis)

Sentiment/tone describes the way issues or specific actors are described in coverage. Many analyses differentiate between negative, neutral/balanced or positive sentiment/tone as b...

DISCOVERING THE EFFECTIVENESS OF TEACHING METHODS IN TEACHING COMMUNICATIVE ARABIC AT SULTAN SHARIF ALI ISLAMIC UNIVERSITY: FACULTY OF ARABIC LANGUAGE AS CASE STUDY

This research aims to identify the effectiveness of the objectives of teaching communicative Arabic at the Faculty of Arabic Language at Sultan Sharif Ali Islamic University in the...

Sentiment Analysis with Python: A Hands-on Approach

Sentiment Analysis is a rapidly growing field in Natural Language Processing (NLP) that aims to extract opinions, emotions, and attitudes expressed in text. It has a wide range o...

Arabic Language Teaching in Arabic Preparatory Schools

This study aims to highlight, describe and analyse the experiment conducted at the Arabic Preparatory School for Girls in Bandar Seri Begawan (SPABSB) and explore how it can be uti...

Teaching Media in the Teaching of Arabic Language/ Media Pembelajaran dalam Pembelajaran Bahasa Arab

This article discusses the media of learning Arabic language, through library studies that focus on distributing material effectively to students without making them boring. The li...

Lies, brands and social media

Purpose The purpose of this study is to illustrate the influence of media coverage and sentiment about brands on user-generated content amplification and opinions expressed in soci...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Email:
Password:

Email:

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

Related Results