Back to Main Conference 2026
LREC 2026main

Do Multimodal LLMs Understand Order? Measuring the Fragility of Multimodal Reasoning under Input Order Perturbations

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4jtpgzks8pbr

Abstract

Multimodal reasoning has progressed rapidly with large vision-language models (LVLMs), yet their robustness under input variations remains underexplored. This study investigates positional bias in LVLMs for multimodal multiple-choice questions. Our analysis shows that model predictions are sensitive to both choice and modality ordering. We conduct a large-scale evaluation on MMMU, CVQA, and MMBench using fourteen representative models. Further analysis examines how question properties, including difficulty, domain, and image type, affect robustness. We also assess whether text-based mitigation strategies transfer to the VQA setting and perform ablation studies on self-consistency and reasoning complexity. Overall, our findings provide the first comprehensive understanding of positional bias from a vision-language perspective, highlighting key challenges in achieving stable multimodal reasoning.

Details

Paper ID
lrec2026-main-716
Pages
pp. 9118-9128
BibKey
wei-etal-2026-do
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SW

    Sheng-Lun Wei

  • YL

    Yu-Ling Liao

  • HH

    Hen-Hsen Huang

  • HC

    Hsin-Hsi Chen

Links