Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion

View through CrossRef
Hate speech detection is an essential technique in the online environment, especially on social media platforms. This technique helps to create a safer space and reduce the risk of real-world harm. In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations. This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM. There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM. Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset. The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines. Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation. Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.
Office of Academic Resources, Chulalongkorn University
Title: Improving chinese hate speech detection with bert-fasttext fusion and BERT-BiLSTM fusion
Description:
Hate speech detection is an essential technique in the online environment, especially on social media platforms.
This technique helps to create a safer space and reduce the risk of real-world harm.
In Chinese, this task is particularly challenging because of unique linguistic structures and the frequent use of indirect expressions, sarcasm, homophones, character variants, and abbreviations.
This study investigates how to improve Chinese hate speech detection by combining BERT with FastText and BERT with BiLSTM.
There are six model variants that are configured: frozen BERT and fine-tuned BERT, each further extended with either FastText sentence embeddings or a clause-level BiLSTM.
Experiments are conducted on a self-annotated Chinese social media dataset and the public COLDataset corpus, including a cross-dataset setting where models are trained on the self-annotated data and evaluated on COLDataset.
The results show that fine-tuned BERT is the main factor of performance gain, and that combining FastText or BiLSTM improves over the corresponding BERT baselines.
Among all models, fine-tuned BERT combined with FastText achieves the best in-domain performance, reaching 92.
58% accuracy on the self-annotated dataset, while also having strong ROC–AUC in the cross-dataset evaluation.
Overall, these findings indicate that simple feature-level fusion of BERT with lexical or clause-level information is an effective and computationally practical way to improve Chinese hate speech detection.

Related Results

Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo
Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started ...
Hate Speech Detection Using Textual and User Features
Hate Speech Detection Using Textual and User Features
Social media platforms provide users with a powerful platform to share their ideas. Using one’s right to expression to incite hatred toward a particular group of people ...
Vihapuheen kohteet ja teemat sekä lajit ja muodot ennen ja nyt
Vihapuheen kohteet ja teemat sekä lajit ja muodot ennen ja nyt
Tässä artikkelissa on analysoitu vihapuheen olemusta ja puhunnan muotoja 1930- ja 2000-luvuilla. Tavoitteena on ollut etsiä niitä yhtäläisyyksiä ja eroja, joita kahdella eri aikaka...
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications
Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Un...
Forensic Linguistics of Hate Speech on Social Media against President Joko Widodo by Chairman of UGM’s Student Executive Board
Forensic Linguistics of Hate Speech on Social Media against President Joko Widodo by Chairman of UGM’s Student Executive Board
This research discusses the hate speech delivered by the chairman of BEM UGM against President Joko Widodo, uploaded on social media. This research uses a forensic linguistic appro...
From Hate Crime to Disability Hate Crime
From Hate Crime to Disability Hate Crime
This chapter traces the journey from hate crime to Disability Hate Crime through an analysis of the relevant literature including policy related documents which construct and refer...
Automatic Hate Speech Detection and the hassle of Offensive Language
Automatic Hate Speech Detection and the hassle of Offensive Language
A key task for automatic hate-speech detection on social media is the separation of hate speech from different instances of offensive language. Lexical detection strategies tend to...

Back to Top