HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-23
Back to CHIPSAL 2026
LREC 2026workshop

HasNat@CHiPSAL 2026: Multimodal Hate Speech Detection in Low-Resource Nepali Memes Using Aligned Vision–Language Models

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/54xeeoceu6qz

Abstract

Memes are widely used for communication on social media but are increasingly exploited to spread hate and harmful stereotypes. Detecting hate speech in memes is particularly challenging because meaning is conveyed jointly through images and embedded text, and the problem becomes more complex in low-resource languages such as Nepali. In this work, we participate in Subtask A of the CHiPSAL 2026 Shared Task, focusing on hate speech detection in Nepali-only memes. We benchmark three multimodal vision language backbones, ViT-B-32 (OpenCLIP), AltCLIP, and BLIP2+mT5, under controlled preprocessing and augmentation settings. Our best-performing system uses AltCLIP to extract aligned text and image representations, followed by a late-fusion classifier trained with stratified 5-fold cross-validation to address class imbalance. The proposed model achieves a macro F1-score of 0.66 on the validation set. Experimental results highlight the effectiveness of aligned vision language representations and demonstrate that preprocessing and augmentation strategies have model-dependent effects in low-resource multimodal hate speech detection.

Details

Paper ID
lrec2026-ws-chipsal-23
Pages
pp. 237-243
BibKey
chowdhury-etal-2026-hasnat
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AC

    Alvee Hasan Chowdhury

  • MA

    MD. ABUL HASNAT

  • AF

    Adnan Faisal

Links