Multimodal Hate and Sentiment Understanding in Low-Resource Text-Embedded Images for Online Safety and Digital Well-being
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Abstract
This paper presents an overview of the Shared Task on Multimodal Hate and Sentiment Understanding in Low-Resource Memes, organized as part of the Second Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2026) at LREC 2026. The task addresses automated content understanding in low-resource settings by focusing on monolingual Nepali memes written in Devanagari script. Built upon the NeMeme dataset, the task comprises two subtasks: (1) binary hate speech detection and (2) three-class sentiment analysis. The competition attracted 23 teams for hate detection and 13 teams for sentiment analysis. Participating teams employed diverse strategies, including late-fusion multimodal architectures combining multilingual text encoders with vision models, caption-based approaches using large vision-language models, and ensemble techniques. The top-performing system achieved macro-F1 scores of 80.52% on hate detection and 68.81% on sentiment analysis using a late-fusion hybrid architecture with discriminative learning rates. Our analysis reveals that multimodal fusion consistently outperforms unimodal baselines, sentiment analysis poses greater challenges than hate detection due to increased semantic nuance, and the scarcity of Devanagari-centric pretrained models remains a significant bottleneck. This shared task establishes a benchmark for multimodal understanding in low-resource South Asian languages and provides insights for developing inclusive content moderation systems.