Back to Main Conference 2026
LREC 2026main

The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4iz2mc2bikvt

Abstract

We present a configurable pipeline and the associated code that can be used to generate multilingual sets of entities with specified characteristics, such as domain, geographical location and popularity, using data from Wikipedia and Wikidata. These datasets are intended for evaluating the factuality of LLMs’ long-form generation, thereby complementing evaluation based on short-form QA datasets. We present the RiDiC dataset as an example of this approach. RiDiC contains 3,000 entities from three domains – rivers, natural disasters, and car models – spanning different popularity tiers. Each entity is accompanied by its geographical location, English and Chinese names (if available) and relevant English and Chinese Wikipedia content, which is used to evaluate LLMs’ responses. Generations about RiDiC entities were obtained from three LLMs in English and Chinese. These were then evaluated using a third-party factuality checker, which showed that entities from our dataset caused even frontier models to hallucinate. The code, data and generation/evaluation scripts have been released to enable the approach to be extended to new LLMs, languages and domains.

Details

Paper ID
lrec2026-main-776
Pages
pp. 9893-9904
BibKey
braslavski-etal-2026-chronicles
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • PB

    Pavel Braslavski

  • DI

    Dmitrii Iarosh

  • NS

    Nikita Sergeevich Sushko

  • AS

    Andrey Sakhovskiy

  • VK

    Vasily Konovalov

  • ET

    Elena Tutubalina

  • AP

    Alexander Panchenko

Links