Javascript must be enabled to continue!

Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Hate speech detection is an essential technique in the online environment, especially on social media platforms. This technique helps to create a safer space and reduce the risk of real-world harm. In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations. This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM. There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM. Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset. The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines. Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation. Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.

Office of Academic Resources, Chulalongkorn University

Methini Ma

2026

Title: Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Description:

Hate speech detection is an essential technique in the online environment, especially on social media platforms.

This technique helps to create a safer space and reduce the risk of real-world harm.

In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations.

This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM.

There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM.

Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset.

The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines.

Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.

58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation.

Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.

Back

Abstract Background The difficulties in defining hate crime, hate incidents and hate speech, and in finding a common conc...

Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo

Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...

Hate Speech Detection Using Textual and User Features

Social media platforms provide users with a powerful platform to share their ideas. Using one’s right to expression to incite hatred toward a particular group of people ...

Vihapuheen kohteet ja teemat sekä lajit ja muodot ennen ja nyt

Tässä artikkelissa on analysoitu vihapuheen olemusta ja puhunnan muotoja 1930- ja 2000-luvuilla. Tavoitteena on ollut etsiä niitä yhtäläisyyksiä ja eroja, joita kahdella eri aikaka...

Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications

Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Un...

Forensic Linguistics of Hate Speech on Social Media against President Joko Widodo by Chairman of UGM’s Student Executive Board

This research discusses the hate speech delivered by the chairman of BEM UGM against President Joko Widodo, uploaded on social media. This research uses a forensic linguistic appro...

From Hate Crime to Disability Hate Crime

This chapter traces the journey from hate crime to Disability Hate Crime through an analysis of the relevant literature including policy related documents which construct and refer...

Automatic Hate Speech Detection and the hassle of Offensive Language

A key task for automatic hate-speech detection on social media is the separation of hate speech from different instances of offensive language. Lexical detection strategies tend to...

Email:
Password:

Email:

Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Related Results