HomeLREC 2026WorkshopsCHIPSALlrec2026-ws-chipsal-32
Back to CHIPSAL 2026
LREC 2026workshop

eGrantha.ai@CHiPSAL 2026: Stochastic Image Captioning for Robust Hate Speech Detection in Low-Resource Nepali Memes

Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)

DOI:10.63317/5d8j4bfmcvry

Abstract

This paper presents a system for hate speech detection in low-resource Nepali memes, submitted as part of Subtask A of the Shared Task on Multimodal Understanding at CHiPSAL 2026. Detecting hateful memes is particularly challenging due to the combination of images, text, and emojis used to portray humor, satire, or sociopolitical commentary, as well as the low-resource nature of the Nepali language. We investigate a range of unimodal and multimodal modeling strategies, including text-only, vision-text, and caption-based approaches. For caption generation, the Gemini family of models (Gemini 2.X and Gemini 3.X) was used to produce contextually rich captions, which are publicly released as NeMeme-CAP on Hugging Face. Caption-based modeling leverages stochastic caption augmentation to address class imbalance and Test-Time Augmentation (TTA) to reduce prediction variance and improve model robustness. The best-performing system fine-tunes an encoder-only transformer model, RoBERTa-base, on the generated captions, achieving third place on the official leaderboard with a macro-averaged F1-score of 0.7397. The code is publicly available at https://github.com/thapaliya123/LREC-CHiPSAL-2026.

Details

Paper ID
lrec2026-ws-chipsal-32
Pages
pp. 308-315
BibKey
thapaliya-2026-egrantha
Editors
Kengatharaiyer Sarveswaran, Ashwini Vaidya
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AT

    Anish Thapaliya

Links