NeuralNoodles@CHiPSAL 2026: Late-Fusion Multimodal Stacking for Nepali Meme Sentiment Classification
Proceedings of the Second workshop on Challenges in Processing South Asian Languages (CHiPSAL2026)
Abstract
Memes have emerged to be an essential medium of online expression, where the sentiment is determined by the interaction of text and image. Sentiment analysis of memes is particularly challenging when the language is low-resource, such as Nepali, due to the lack of resources and the complex relationships between text and image modalities. In this paper, we report our submission to Subtask B of CHiPSAL 2026, where the task was sentiment analysis of Nepali text-embedded memes for three sentiment classes: Negative, Neutral, and Positive. Through this submission, we present a late fusion multimodal framework that encompasses lexical, semantic, and visual models through a cross-validated stacking approach. Our submission to the shared task competition received a Macro F1 of 0.5045 on the official test set, achieving 6th place in the leaderboard. This demonstrates the strength of well-structured late fusion approaches to multimodal sentiment analysis of text-embedded memes.