Automating FAIRness: A FAIRification Tool within the Language Resources Infrastructure

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

In addition to technical interoperability, FAIRness encompasses governance, policy, and ethical aspects, reflecting how language data are produced, represented, and managed within research infrastructures. Ensuring FAIR compliance of language resources is essential for transparent and sustainable research in the social sciences and humanities, enabling data accessibility, quality, and long-term community reuse. The FAIRification Tool — created by CLARIN IT as part of the Humanities and Heritage Italian Open Science Cloud (H2IOSC) — is a modular system that automates and enhances FAIR compliance for language resources. The tool builds upon and extends existing FAIR data assessment frameworks by combining automatic and human validation, a feedback dashboard, certification thresholds, and domain-specific extensions aligned with linguistic metadata standards. It supports FAIR-by-design practices by operationalizing FAIR concepts and embedding them into repository workflows, thereby promoting interoperability across CLARIN, H2IOSC, and EOSC. The tool’s effectiveness has been demonstrated through an initial evaluation conducted on a representative set of linguistic datasets, which revealed notable improvements (30–40%) in FAIR scores, particularly in the Findable and Reusable dimensions, contributing to responsible, policy-aware, and transparent language data management within the European Open Science landscape. demonstrated through an initial evaluation conducted on a representative set of linguistic datasets, which revealed notable improvements (30–40%) in FAIR scores, particularly in the Findable and Reusable dimensions, contributing to responsible, policy-aware, and transparent language data management within the European Open Science landscape.