Javascript must be enabled to continue!
Extremism Detection in the Iraqi Dialect Based on Machine Learning
View through CrossRef
Extremism detection is an important area of natural language processing (NLP). It is used to detect hate speech, sectarianism, and terrorism on social media. This field has been discussed and studied in many international languages, especially Arabic and English, as many studies touched on languages in particular, but dialects were not addressed even though users of social networking sites write in their dialect. One of the most difficult Arabic dialects is the Iraqi dialect. Because the Iraqi dialect has few sources on the Internet regarding available data that can be used by researchers, this research aims to detect extremism in Iraqi texts using machine learning. The data was pre-processed by deleting suffixes and prefixes for Iraqi words, deleting repeated letters in the word, and deleting Iraqi stop words. Pre-trained embedding as well as embedding using Gensim Word2vec and FastText were used to represent the words in the embedding step. Also, four learning classifiers were used: Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), and Gaussian Naive Bayes (GNB). The experiments were conducted on two Iraqi datasets collected from social media platforms related to extremism: the Iraqi Facebook Comments Dataset (IFCD) and the Iraqi Tweets Dataset (ITD). The performance of all models was evaluated using accuracy, macro-average precision, macro-average recall, and macro-average F1-score; the best F1-score is 0.9521, while recall and precision are 0.95 and 0.955, respectively. In addition, the models presented in this research were tested on an Iraqi data set related to hate speech available on the Internet, and the results obtained were compared with the results of the work that provided this data set.
University of Baghdad College of Science
Title: Extremism Detection in the Iraqi Dialect Based on Machine Learning
Description:
Extremism detection is an important area of natural language processing (NLP).
It is used to detect hate speech, sectarianism, and terrorism on social media.
This field has been discussed and studied in many international languages, especially Arabic and English, as many studies touched on languages in particular, but dialects were not addressed even though users of social networking sites write in their dialect.
One of the most difficult Arabic dialects is the Iraqi dialect.
Because the Iraqi dialect has few sources on the Internet regarding available data that can be used by researchers, this research aims to detect extremism in Iraqi texts using machine learning.
The data was pre-processed by deleting suffixes and prefixes for Iraqi words, deleting repeated letters in the word, and deleting Iraqi stop words.
Pre-trained embedding as well as embedding using Gensim Word2vec and FastText were used to represent the words in the embedding step.
Also, four learning classifiers were used: Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), and Gaussian Naive Bayes (GNB).
The experiments were conducted on two Iraqi datasets collected from social media platforms related to extremism: the Iraqi Facebook Comments Dataset (IFCD) and the Iraqi Tweets Dataset (ITD).
The performance of all models was evaluated using accuracy, macro-average precision, macro-average recall, and macro-average F1-score; the best F1-score is 0.
9521, while recall and precision are 0.
95 and 0.
955, respectively.
In addition, the models presented in this research were tested on an Iraqi data set related to hate speech available on the Internet, and the results obtained were compared with the results of the work that provided this data set.
Related Results
A Study of the Chungcheong Dialect as a Literary Dialect in the Pansori Lyrics of Park Dongjin
A Study of the Chungcheong Dialect as a Literary Dialect in the Pansori Lyrics of Park Dongjin
This paper examines the Chungcheong dialect in Park Dongjin's pansori editorials from the perspective of “Literary Dialect,” focusing on phonological, morphological, and lexical is...
Communication Strategies to Counter Violent Extremism in Pakistan
Communication Strategies to Counter Violent Extremism in Pakistan
Purpose - Violent extremism has disrupted the social harmony of many countries all over the globe. Pakistan has been marked as an extremist state, becoming one of Pakistan's bigges...
Muuttuva ja muuttumaton murre
Muuttuva ja muuttumaton murre
Murteet ovat kehittyneet kulttuuriperinnöksi ja identiteetin rakennuksen välineeksi pitkien prosessien seurauksena. Porin seudullakin murrekirjallisuudella ja murteen käytöllä on j...
Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Bukovyna dialect of the village Yuzhynets
Bukovyna dialect of the village Yuzhynets
The article deals with description of one dialect as a system. The purpose of of this study is to describe the main features of the dialect v. Yuzhynets, manifested in oral dialect...
On the Concept of Extremism
On the Concept of Extremism
The article discusses the analysis of approaches to the definition of the concept of “extremism”. The relevance of the study is predetermined by the importance of the qualitative o...
“BENTUK HORMAT†DIALEK BAHASA BALI AGA DALAM KONTEKS AGAMA
“BENTUK HORMAT†DIALEK BAHASA BALI AGA DALAM KONTEKS AGAMA
Balinese language has two major dialects, Lowland Balinese dialect (BD dialect) and Mountain Balinese dialect (BA dialect). BD dialect has a systematic form of respect (Sor Singgih...

