Javascript must be enabled to continue!

Hate Speech Detection Using Textual and User Features

Social media platforms provide users with a powerful platform to share their ideas. Using one’s right to expression to incite hatred toward a particular group of people is inappropriate. However, hate speech is pervasive in our society. Spreading hate through online social networks like Facebook, Twitter, Tiktok, and Instagram is commonplace in today’s milieu. One such case is the unprecedented COVID-19 pandemic, which engendered anti-Asian hate. In current literature, there is limited study on using user features in conjunction with textual features to detect hate. This thesis aims to combine textual features with user features to improve the state-of-the-art hate speech detection technique. To test our approach, we used four different datasets available in the public domain. We have used various tools to access Twitter APIs to extract required user information, either to use directly or further compute other features using that information. We have represented the textual features in the form of BERT embeddings and linguistic features. The 97 linguistic measures computed with a Linguistic Inquiry and Word Count (LIWC) tool quantify the text’s cognitive, affective, and grammatical processes. The user feature consisted of demographic, behavioral-based, emotion-based, personality, readability, and writing style features. Our experimental evaluation over three datasets shows that the top twenty linguistic features and the top twenty user features are the best combinations for hate speech detection. Hate speech is mostly emotionally charged. We further analyzed these user and linguistic features. Among the most intuitive and prominent results was that features like anger, negative emotion, swearing, fear, and annoyance were high in hate speech, while the happiness feature was low. We compared multiple approaches along with the existing state-of-the-art. We found that the best approach with textual features was combining LIWC features with BERT embeddings. This combination gave us the F1 of 0.82 and 0.79 on Crowd-sourced (DS1) and Kaggle (DS3), respectively. Followed by this, we identified the top LIWC and user features for hate speech detection. We found that features representing negative emotions like anger, fear, sadness, and annoyance were prominently high in hate speech. Happiness is lower in hate speech. After this, we analyzed the F1 scores with standalone LIWC and user features. We also used their combinations. We found that the combination of the top twenty LIWC and top twenty user features gives the best F1 scores of 0.74, 0.90, and 0.64 on DS1, NAACL (DS2), and anti-Asian Covid hate (DS4) dataset. Finally, we used traditional machine learning algorithms combining BERT embeddings with the top twenty linguistic features and the top twenty user features. We obtained the F1 scores of 0.78, 0.92, and 0.84 on DS1, DS2, and DS4 respectively. We also compared our approach with other studies using user and textual features.

Boise State University, Albertsons Library

Rohan Raut

2023

Title: Hate Speech Detection Using Textual and User Features

Description:

Social media platforms provide users with a powerful platform to share their ideas.

Using one’s right to expression to incite hatred toward a particular group of people is inappropriate.

However, hate speech is pervasive in our society.

Spreading hate through online social networks like Facebook, Twitter, Tiktok, and Instagram is commonplace in today’s milieu.

One such case is the unprecedented COVID-19 pandemic, which engendered anti-Asian hate.

In current literature, there is limited study on using user features in conjunction with textual features to detect hate.

This thesis aims to combine textual features with user features to improve the state-of-the-art hate speech detection technique.

To test our approach, we used four different datasets available in the public domain.

We have used various tools to access Twitter APIs to extract required user information, either to use directly or further compute other features using that information.

We have represented the textual features in the form of BERT embeddings and linguistic features.

The 97 linguistic measures computed with a Linguistic Inquiry and Word Count (LIWC) tool quantify the text’s cognitive, affective, and grammatical processes.

The user feature consisted of demographic, behavioral-based, emotion-based, personality, readability, and writing style features.

Our experimental evaluation over three datasets shows that the top twenty linguistic features and the top twenty user features are the best combinations for hate speech detection.

Hate speech is mostly emotionally charged.

We further analyzed these user and linguistic features.

Among the most intuitive and prominent results was that features like anger, negative emotion, swearing, fear, and annoyance were high in hate speech, while the happiness feature was low.

We compared multiple approaches along with the existing state-of-the-art.

We found that the best approach with textual features was combining LIWC features with BERT embeddings.

This combination gave us the F1 of 0.

82 and 0.

79 on Crowd-sourced (DS1) and Kaggle (DS3), respectively.

Followed by this, we identified the top LIWC and user features for hate speech detection.

We found that features representing negative emotions like anger, fear, sadness, and annoyance were prominently high in hate speech.

Happiness is lower in hate speech.

After this, we analyzed the F1 scores with standalone LIWC and user features.

We also used their combinations.

We found that the combination of the top twenty LIWC and top twenty user features gives the best F1 scores of 0.

74, 0.

90, and 0.

64 on DS1, NAACL (DS2), and anti-Asian Covid hate (DS4) dataset.

Finally, we used traditional machine learning algorithms combining BERT embeddings with the top twenty linguistic features and the top twenty user features.

We obtained the F1 scores of 0.

78, 0.

92, and 0.

84 on DS1, DS2, and DS4 respectively.

We also compared our approach with other studies using user and textual features.

Back

Abstract Background The difficulties in defining hate crime, hate incidents and hate speech, and in finding a common conc...

Vihapuheen kohteet ja teemat sekä lajit ja muodot ennen ja nyt

Tässä artikkelissa on analysoitu vihapuheen olemusta ja puhunnan muotoja 1930- ja 2000-luvuilla. Tavoitteena on ollut etsiä niitä yhtäläisyyksiä ja eroja, joita kahdella eri aikaka...

Forensic Linguistics of Hate Speech on Social Media against President Joko Widodo by Chairman of UGM’s Student Executive Board

This research discusses the hate speech delivered by the chairman of BEM UGM against President Joko Widodo, uploaded on social media. This research uses a forensic linguistic appro...

Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo

Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...

From Hate Crime to Disability Hate Crime

This chapter traces the journey from hate crime to Disability Hate Crime through an analysis of the relevant literature including policy related documents which construct and refer...

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)

BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...

Countering hate speech: modeling user-generated web content using natural language processing

Social media is considered a particularly conducive arena for hate speech. Counter speech, which is a "direct response that counters hate speech" is a remedy to address hate speech...

Automatic Hate Speech Detection and the hassle of Offensive Language

A key task for automatic hate-speech detection on social media is the separation of hate speech from different instances of offensive language. Lexical detection strategies tend to...

Email:
Password:

Email:

Hate Speech Detection Using Textual and User Features

Related Results