Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-ws-kgllm-19

A Wikidata-Based Framework to Measure Cross-Lingual Bias in Multilingual Large Language Models

View lrec2026-ws-kgllm-19.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

A Wikidata-Based Framework to Measure Cross-Lingual Bias in Multilingual Large Language Models

Abstract

Multilingual large language models (LLMs) are increasingly used for factual question answering, yet their accuracy varies across languages in ways that are difficult to interpret. A central challenge is that many multilingual probing benchmarks conflate multiple factors: the language used to ask the question, the cultural-linguistic context of the entities being queried, and the popularity skew of entities. In our paper, we disentangle these factors by asking: (i) how strongly does the Language of the Question (LoQ) affect factual recall, (ii) does matching LoQ to an entity-associated Language of the Entity (LoE) improve performance, and (iii) do these effects persist when entity popularity is controlled. To this end, we introduce WILA-PopQA, a new Wikidata-grounded benchmark spanning 9 languages with matched popularity profiles, and probe 12 open-weight models of varying sizes and architectures under aligned and misaligned LoQ–LoE conditions. We evaluate models’ answers to 4 types of questions about entity biographical properties in all selected languages. Results show that LoQ is the dominant source of variation. LoQ–LoE alignment does not consistently yield the highest accuracy, and performance depends on the property being asked. These results suggest that prompt language is an actionable experimental factor for multilingual factual evaluation.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.