Back to Main Conference 2026
LREC 2026main

The Swedish Benchmark of Linguistic Minimal Pairs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/33cfy28hybv5

Abstract

We introduce the Swedish Benchmark of Linguistic Minimal Pairs, a dataset for evaluating syntactic performance in language models. It includes 2,500 minimal pairs organized into 25 syntactic phenomena, with 100 pairs per phenomenon. Each pair contrasts a well-formed and an ill-formed sentence that differ minimally. For each phenomenon, we manually constructed ten pairs from scratch. We semi-automatically generated the remaining 90 pairs and manually adjusted them. A random sample was assessed by 40 participants, who selected the well-formed sentence in 98.05% of cases. We evaluate eleven state-of-the-art models. Results generally show that models handle local agreement well but struggle with certain long-distance dependencies and word order phenomena. Model size seems to matter less than the training domain. Prompt-based evaluation generally lowers performance. We show that model performance is stable across handcrafted and generated subsets and across sample sizes, suggesting that 100 pairs per phenomenon suffice for reliable evaluation. Future work will expand the number of phenomena.

Details

Paper ID
lrec2026-main-540
Pages
pp. 6783-6794
BibKey
sjons-etal-2026-swedish
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JS

    Johan Sjons

  • FH

    Fredrik Heinat

  • MK

    Murathan Kurfali

Links