Beyond Generic Responses: Target-Aware Strategies for Countering Hate Speech
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Effective counter-narratives (CNs) are essential for combating online hate speech, yet generic responses often fail to address the specific needs of targeted groups. This paper proposes a target-aware CN generation framework that incorporates demographic-specific tokens into transformer-based models. Our approach enhances the contextual relevance by introducing target-group tokens into the model’s vocabulary. To assess CN quality, we employ a multifaceted evaluation framework, including automatic metrics and LLM as Judges (JudgeLM). Evaluation with a wide range of language models demonstrates that target group tokens markedly improve contextual relevance of generated CN, particularly in small and medium models, with measurable gains in validity as CN and contextual relevance. Even for large instruction-tuned models, such as LLaMA-3, incorporating target-specific information proves effective in enhancing contextual relevance of generated responses. Warning: This paper contains offensive texts that are only used for combating online hate.