Back to Main Conference 2026
LREC 2026main

MultiWikiQA: A Reading Comprehension Benchmark in 300+ Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2msrgsu9isrx

Abstract

We introduce a new reading comprehension dataset, dubbed MultiWikiQA, which covers 306 languages and has 1,220,757 samples in total. We start with Wikipedia articles, which also provide the context for the dataset samples, and use an LLM to generate question/answer pairs related to the Wikipedia article, ensuring that the answer appears verbatim within the article. Next, the question is then rephrased to hinder simple word matching methods from performing well on the dataset. We conduct a crowdsourced human evaluation of the fluency of the generated questions, which included 156 respondents across 30 of the languages (both low- and high-resource). All 30 languages received a mean fluency rating above “mostly natural”, showing that the samples are of good quality. We evaluate 6 different language models, both decoder and encoder models of varying sizes, showing that the benchmark is sufficiently difficult and that there is a large performance discrepancy amongst the languages. Both the dataset and survey evaluations are publicly available.

Details

Paper ID
lrec2026-main-499
Pages
pp. 6298-6311
BibKey
smart-2026-multiwikiqa
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • DS

    Dan Saattrup Smart

Links