Questionnaire Meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

Millions of people take surveys every day, from market polls to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but the ability of large language models (LLMs) to process questionnaire data, where lists of questions are crossed with hundreds of respondent rows, remains underexplored. Current survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are designed for human operators, leaving practitioners without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this gap by introducing QASU (Questionnaire Analysis and Structural Understanding), a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference, across six serialization formats and multiple prompt strategies. Experiments on five LLMs (GPT-5-mini, Gemini-2.5-Flash, Qwen3-32B, Llama3-70B, Amazon Nova Lite) show that format choice significantly impacts performance, with up to 9 percentage points improvement over baseline formats, and reveal substantial gaps (10 to 30 percentage points) between proprietary and open-weight models. Self-augmented prompting yields model-dependent benefits, proving effective for proprietary models but unreliable for open-weight alternatives. By systematically isolating format and prompting effects, our open-source benchmark offers practical guidance for advancing both research and real-world practice in LLM-based questionnaire analysis.