Javascript must be enabled to continue!

Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Hate speech detection is an essential technique in the online environment, especially on social media platforms. This technique helps to create a safer space and reduce the risk of real-world harm. In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations. This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM. There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM. Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset. The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines. Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation. Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.

Office of Academic Resources, Chulalongkorn University

Methini Ma

2026

Title: Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Description:

Hate speech detection is an essential technique in the online environment, especially on social media platforms.

This technique helps to create a safer space and reduce the risk of real-world harm.

In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations.

This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM.

There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM.

Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset.

The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines.

Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.

58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation.

Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.

Back

Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...

Vihapuheen kohteet ja teemat sekä lajit ja muodot ennen ja nyt

Tässä artikkelissa on analysoitu vihapuheen olemusta ja puhunnan muotoja 1930- ja 2000-luvuilla. Tavoitteena on ollut etsiä niitä yhtäläisyyksiä ja eroja, joita kahdella eri aikaka...

Hate speech in Ukrainian internet space: religious context

Introduction. Recently in the Ukrainian media space, there are a lot of info texts on religious topics; however, special attention should be given to the problem of hate speech use...

From Hate Crime to Disability Hate Crime

This chapter traces the journey from hate crime to Disability Hate Crime through an analysis of the relevant literature including policy related documents which construct and refer...

Kajian Kriminologi Tindakan Hate Speech Akun Fufufafa dan Penerapan Hukum Pidana

Abstract. The advancement of information and communication technology has given rise to the cyber era, transforming the way society interacts, including how individuals express the...

Modeling and Analysis of Hate speech Propagation in a Community using Fractional Order Derivatives

Abstract The propagation of hate speech directed toward local public sector administrations in a community has become an issue of great concern. Hate speech not only underm...

The Nuclear Fusion Award

The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...

Countering hate speech: modeling user-generated web content using natural language processing

Social media is considered a particularly conducive arena for hate speech. Counter speech, which is a "direct response that counters hate speech" is a remedy to address hate speech...

Email:
Password:

Email:

Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

Related Results