HomeLREC 2026WorkshopsRESOURCEFULlrec2026-ws-resourceful-12
Back to RESOURCEFUL 2026
LREC 2026workshop

MultiZebraLogic: A Multilingual Logical Reasoning Benchmark

The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)

DOI:10.63317/47jt2j8274nd

Abstract

We create high-quality datasets for LLM evaluation of logical reasoning skills across nine different languages, which have been manually checked by fluent speakers. The datasets consist of so-called zebra puzzles, and we analyse different ways of tuning the difficulty of the puzzles to fit modern LLMs. This includes the size of the puzzle (number of objects and number of clues), as well as a novel addition of red herring clues containing only irrelevant information. We show that presence of red herrings indeed makes the puzzles significantly harder for the models, and we find puzzle sizes 2×3 and 4×5 are sufficiently challenging for GPT-4o mini (a non-reasoning model) and o3-mini (a reasoning model), respectively. We analyse whether LLM performance of these are sensitive to the language, the cultural sensitivity of the puzzle theme, and the choice of clue types. These analyses are conducted with English and Danish, where we show that there is no significant difference for either of these three aspects, at least for the OpenAI models GPT-4o mini and o3-mini, chosen as representative non-reasoning and reasoning models, respectively. We publish the datasets for each of the nine languages for the identified sizes 2×3 and 4×5. We also publish the code used to generate the puzzles, which can be used to extend the benchmark into more languages.

Details

Paper ID
lrec2026-ws-resourceful-12
Pages
pp. 119-130
BibKey
bruun-etal-2026-multizebralogic
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SB

    Sofie Bruun

  • DS

    Dan Saattrup Smart

Links