HomeLREC 2026WorkshopsLLMS4SSHlrec2026-ws-llms4ssh-18
Back to LLMS4SSH 2026
LREC 2026workshop

Automatic Evaluation of Multiple-Choice Items for Reading Comprehension: Effects of Question and Distractor Categories

Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026

DOI:10.63317/2htm3v3vdbuv

Abstract

Automatic generation of multiple-choice (MC) items for reading comprehension can support language learning by providing large amounts of practice materials. To enable rapid development of MC generation models, automatic assessment is essential since it is time-consuming to manually evaluate question and distractor quality. Although Text Informativity (TI) has been adopted as an automatic evaluation metric, the ability of Large Language Models (LLMs) to estimate the TI scores of different categories of questions and distractors has not yet been thoroughly analyzed. This paper investigates LLM performance in calculating TI scores for the range of questions and distractors defined in the PIRLS (Progress in International Reading Literacy Study) and STARC (Structured Annotations for Reading Comprehension) frameworks. We show that automatically estimated TI scores may result in systematic preferences for some question and distractor categories, and recommend that TI scores be used for within-category comparisons only.

Details

Paper ID
lrec2026-ws-llms4ssh-18
Pages
pp. 170-174
BibKey
lee-etal-2026-automatic
Editors
Arturo Montejo-Raez, Cristina Grisot, Joanna Blochowiak, Nikola Ljubešić, Elena Battaner, German Rigau
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Shaping Multilingual, Multimodal AI for the Social Sciences and Humanities (LLMs4SSH) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • JL

    John S. Y. Lee

  • YP

    Yin Poon

  • SW

    Shunjie Wang

  • KC

    Kai Wah Chu

Links