Nicolás Benjamín Ocampo, Elena Cabrio, and Serena Villata. 2023. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore.
This research addresses the challenge of detecting implicit hate speech (HS) in user-generated content. It presents a fourfold contribution: a comparative analysis of transformer-based models on datasets with implicit HS, an examination of embedding representations for veiled cases, a comparison linking explicit and implicit HS through their targets to improve embeddings, and a demonstration of enhanced performance in borderline HS classification cases.
Playing the Part of the Sharp Bully: Generating Adversarial Examples for Implicit Hate Speech Detection
Nicolás Benjamín Ocampo, Elena Cabrio, and Serena Villata. 2023. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada.
This paper introduces a framework for generating adversarial implicit hate speech (HS) messages using Auto-regressive Language Models, categorizing them into EASY, MEDIUM, and HARD complexity levels. It also presents a "build it, break it, fix it" training approach, demonstrating that retraining state-of-the-art models with HARD messages significantly improves their performance on implicit HS detection.
Nicolás Benjamín Ocampo, Elena Cabrio, and Serena Villata. 2023. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Dubrovnik, Croatia.
The study explores the difficulty in detecting subtle and implicit hate speech (HS) on social media, which is more complex than explicit HS. It reveals that advanced neural network models are effective in identifying explicit HS but struggle with subtle and implicit forms, indicating the need for further research in this area.