HomeLREC 2026WorkshopsDTFlrec2026-ws-dtf-01
Back to DTF 2026
LREC 2026workshop

Derived Text Formats as Strategic Transformations of In-Copyright Materials to Support Open Science: A Survey

Proceedings of Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science (DTF) @ LREC 2026

DOI:10.63317/29k2v5o6rudf

Abstract

Derived Text Formats (DTFs) are the result of a strategic transformation of textual materials that are protected by copyright in their original form, such that the resulting data is useful for computational analyses and can be openly shared following best practices of Open Science without infringing copyright law. This paper aims to provide insights into several key aspects of this concept that is closely related to concepts such as corpus masking, non-consumptive research and extracted features. The paper establishes the motivation for using DTFs, discusses several foundational aspects of the concept and practice, describes ongoing research on issues including copyright, reconstructibility, evaluation and standardization of DTFs, and concludes with a roadmap for future work on DTFs. In this way, this paper provides a broad but concise overview of work on DTFs as a contribution to Open Science practices, with a focus on work in the Digital Humanities.

Details

Paper ID
lrec2026-ws-dtf-01
Pages
pp. 1-15
BibKey
schch-2026-derived
Editors
Florian Barth, Keli Du, José Calvo Tello, Philippe Genêt, Piroska Lendvai, Christof Schöch, Thorsten Trippel
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science (DTF) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • CS

    Christof Schöch

Links