Back to Main Conference 2026
LREC 2026main

Evaluating the Effect of Question Wording Variations on Answer Consistency in Large Language Models

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4k8j56pzchi7

Abstract

Large Language Models (LLMs) sometimes generate inconsistent answers when asked semantically equivalent questions expressed with different wordings. Such inconsistency may lead to decreased task performance or excessive agreement with users. This study investigates how question wording influences the answer consistencies of LLMs, focusing on binary Yes/No questions. We design four types of paraphrasing patterns, namely synonym substitution, antonym substitution, addition of agreement-seeking expressions, and strengthened agreement-seeking expressions, and evaluate their impact on model outputs. Experiments with multiple open-source and commercial LLMs show that many models become more likely to answer "Yes" when agreement-seeking expressions are included, and they are particularly vulnerable to antonym substitutions. Our analysis further suggests that some of these tendencies are already present in pretrained models and are not fully removed by post-training. We also provide insights into which factors are likely (or unlikely) to contribute to improving consistency. By providing a systematic evaluation framework, this work highlights the necessity of accounting for wording-induced biases in the development and deployment of LLMs.

Details

Paper ID
lrec2026-main-225
Pages
pp. 2874-2886
BibKey
takayama-etal-2026-evaluating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JT

    Junya Takayama

  • MO

    Masaya Ohagi

  • TM

    Tomoya Mizumoto

  • KY

    Katsumasa Yoshikawa

Links