Balancing FAIR and GDPR: A Governance Framework for Oral Archives
Proceedings of the Joint Workshop on Legal and Ethical Issues in Human Language Technologies and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (LEGAL2026 and CALD-pseudo 2026) @ LREC 2026
Abstract
This paper presents a governance framework developed within the research project ROADS to support thesustainable management of oral archives, which constitute essential linguistic resources for interdisciplinary research and cultural heritage preservation. Oral archives raise complex ethical and legal challenges due to the hybrid nature of voice data, which function simultaneously as historical documents, scientific sources and biometric identifiers, thereby creating tensions between open science principles and data protection regulations. The proposed framework integrates FAIR principles (Findable, Accessible, Interoperable, Reusable) with Privacy by Design and the GDPR accountability principle through a multilayered approach. It introduces an access model that distinguishes between publicly available metadata and controlled access to identifiable audio materials, following trusted repository standards. The framework also incorporates consent management procedures and safeguards for legacy collections, enabling responsible data sharing while preserving scientific usability. More broadly, ROADS provides a transferable model to guide the transition from project-based archives to FAIR, sustainable and reusable research resources, ensuring compliance with data protection requirements and respect for the sensitivity of the documented contexts.