Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-main-664

Are Social Biases in LLMs Consistent across Generative Tasks? A Case Study for Basque

View lrec2026-main-664.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

Are Social Biases in LLMs Consistent across Generative Tasks? A Case Study for Basque

Abstract

Most bias benchmarks for Large Language Models (LLMs) rely on multiple-choice formats, overlooking subtler biases that emerge in open-ended text generation. This gap is particularly relevant for low-resource languages like Basque, where culturally grounded evaluation resources are limited. We introduce BasqBBG (Basque Bias Benchmark for Generation), the first systematic benchmark for social bias in Basque Natural Language Generation (NLG), covering eight bias categories—including a newly added feminism dimension—adapted from the BasqBBQ dataset. We validate an LLM-as-a-Judge framework against expert human evaluations on two NLG tasks (story continuation and generative QA), achieving strong agreement (agreement of 0.78 in bias presence and 0.92 in bias directionality). We scale this approach to ten additional tasks and five models. Results show that bias levels vary markedly across tasks and depend more on model family than size: Llama-based models exhibit higher and less consistent bias (45–50%), whereas GPT-4o and the Gemma-based Kimu-9B remain substantially fairer (≤20%). Our findings highlight the need for task-aware, language-specific frameworks to assess social bias in generative LLMs. Keywords: Large Language Models, Social Bias, Basque, Natural Language Generation, Benchmarking, Manual Evaluation, LLM-as-a-judge.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.