Back to Main Conference 2026
LREC 2026main

Leveraging Semi-Supervised Learning for Multimodal Hate Speech Data Annotation and Detection

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4un2wjkpdn2m

Abstract

While the Internet and social media have fundamentally transformed our lives, they can also rapidly spread hate speech, i.e., derogatory statements targeting individuals or groups based on their immutable characteristics. Automatic detection systems could help limit this harmful phenomenon. However, the lack of large-scale annotated datasets remains a major bottleneck for developing better algorithms. In this work, we employ semi-supervised learning (SSL) to leverage the advantages of limited labeled data alongside large amounts of unlabeled data. We apply three SSL approaches, Fix-match, Full-match, and All-match learning, to enhance the performance of end-to-end pre-trained speech and text models for hate speech detection. Our findings indicate that SSL methods enhance the performance, achieving F1 scores of 0.851 on speech, 0.957 on text, and 0.959 with multimodal fusion. Furthermore, we analyze the impact of different weak augmentation strategies on labeled data and assess the quality of generated pseudo-labels to evaluate their potential use in data annotation.

Details

Paper ID
lrec2026-main-806
Pages
pp. 10266-10275
BibKey
rammohan-etal-2026-leveraging
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • RR

    Rathi Adarshi Rammohan

  • ZR

    Zhao Ren

  • DP

    Dominik Puchała

  • Aleksandra Świderska

  • DK

    Dennis Küster

  • TS

    Tanja Schultz

Links