Back to Main Conference 2026
LREC 2026main

Towards Improving Multimodal Machine Translation with LLMs: A Focus on Indic Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4od6be42j78m

Abstract

Recent advances in Multimodal Machine Translation (MMT) have attempted to address ambiguity and polysemy in text alone by enabling models to draw additional contextual cues from paired images, thereby improving disambiguation and translation accuracy. Datasets such as Multi30K and Visual Genome have significantly advanced this line of research. However, these datasets do not always compel models to rely on visual information. The CoMMuTE dataset takes a stronger step in this direction by serving as an evaluation benchmark specifically designed around ambiguous English sentences that can only be correctly interpreted with their accompanying images. In this work, we extend CoMMuTE to two Indic languages, introducing IndicCoMMuTE — an evaluation dataset for assessing MMT systems on low-resource Indic languages. We benchmark a range of open-source multimodal Large Language Models (< 15B parameters) and a strong text-only baseline across eight languages. We fine-tune one of these LLMs on two Indic languages. Our findings provide insights into the strengths and limitations of LLMs and establish IndicCoMMuTE as a valuable benchmark for future research on Multimodal Machine Translation in Indic languages.

Details

Paper ID
lrec2026-main-698
Pages
pp. 8872-8882
BibKey
dash-etal-2026-improving
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AD

    Amulya Ratna Dash

  • CW

    Chirag Wadhwa

  • YS

    Yashvardhan Sharma

Links