Back to Main Conference 2026
LREC 2026main

GeneFRDebate: Generated French Debates from News Articles with Industrial-Expert Summaries

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4zuibqqim37u

Abstract

Summarizing domain-specific conversations, such as political debates, remains challenging despite advances in large language models (LLMs), and resources for French debates are particularly limited. We present GeneFRDebate, a new dataset of synthetic French political debates generated from real-world news articles using an LLM, while keeping expert-written summaries unchanged. Our pipeline combines prompt engineering, human curation, and quality evaluation using both automatic metrics and expert assessment. We also provide baseline experiments with small-scale LLMs (≤8B parameters), demonstrating the dataset’s usefulness for training and evaluation. This work shows that carefully generated synthetic data with human oversight can complement existing corpora, supporting research in multilingual and domain-specific dialogue summarization.

Details

Paper ID
lrec2026-main-143
Pages
pp. 1831-1841
BibKey
abrougui-etal-2026-genefrdebate
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • RA

    Rim Abrougui

  • GL

    Guillaume Lechien

  • ES

    Elisabeth Savatier

  • BL

    Benoît Laurent

Links