Back to Main Conference 2026
LREC 2026main

Exploring Social Bias in Slovenia: The EEC-SL Dataset

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2pdt2x4ci6e5

Abstract

We introduce the EEC-SL dataset, an adaptation of the Equity Evaluation Corpus from English to Slovenian. Based on 11 sentence templates, the dataset contains 8,640 sentences, including pairs of minimally-distant sentences, varying with regard to one of two variables: gender (female or male), and ethnicity (Slovenian or not-Slovenian). In order to validate our selection of personal names, we create a localised version of the Implicit Association Test for ethnic bias, in which participants show a significant implicit bias favouring Slovenian over non-Slovenian names. We use the dataset to evaluate social bias in three computational language models (large language models and an encoder-only transformer) to perform sentiment analysis—specifically, valence. We analyse the results in terms of differences in sentiment between minimally-distant groups of sentences and inferential tests. We found limited evidence for social bias with regard to ethnicity, and no evidence for gender bias, in any of the employed models.

Details

Paper ID
lrec2026-main-318
Pages
pp. 4019-4030
BibKey
caporusso-etal-2026-exploring
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JC

    Jaya Caporusso

  • DH

    Damar Hoogland

  • BK

    Boshko Koloski

  • MP

    Matthew Purver

  • SP

    Senja Pollak

  • SV

    Spela Vintar

Links