Legal Considerations in the Use of Synthetic Data for AI Development and Finetuning: The Case of LLMs4EU

Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026

DOI:10.63317/3vwisn8odtmp

Abstract

This paper examines the legal implications of using synthetic data to develop and fine‑tune general‑purpose AI models in the European Union, using the LLMs4EU project as a case study. It situates synthetic data within the Union’s broader data policy and highlights it as a candidate tool for reconciling data availability with regulatory constraints. From a data‑protection perspective, it analyses whether and when synthetic data should be classified as "personal data" under the GDPR. From a copyright and contractual standpoint, the paper assesses the risks that synthetic datasets may embed infringing content or derive from unlawfully trained models, in light of the GEMA v. OpenAI ruling on memorised works and emerging analyses of liability for AI‑generated outputs, and considers the constraints imposed by model licensing and acceptable‑use policies on using models to generate training data for other models. The paper concludes that synthetic data can play a valuable role in mitigating legal risks and enabling compliant AI development in LLMs4EU, but only if its generation and use are embedded in robust governance frameworks that address data protection, copyright and contractual obligations across the entire data value chain.

Resources

Details

Paper ID

lrec2026-ws-legal-10

Pages

pp. 86-90

DOI

10.63317/3vwisn8odtmp

BibKey

talmoudi-etal-2026-legal

Editors

Ingo Siegert, Maria Irena Szawerna, Khalid Choukri, Simon Dobnik, Paweł Kamocki, Therese Lindström Tiedemann, Pierre Lison, Ricardo Muñoz Sánchez, Ildikó Pilán, Lisa Södergård, Kossay Talmoudi, Elena Volodina, Xuan-Son Vu

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

KT
Kossay Talmoudi
KC
Khalid Choukri
AG
Amélie Gourgeot
FA
Florine Astruc

Links

URL

DOI