HomeLREC 2026WorkshopsLDLlrec2026-ws-ldl-04
Back to LDL 2026
LREC 2026workshop

Consolidating Syntactically Annotated Corpora with LLOD Technology. An Experiment in the Old Saxon Heliand

Proceedings of 10th Workshop on Linked Data in Linguistics (LDL-2026)

DOI:10.63317/2xfa323m5wof

Abstract

The humanities are a vast and highly diverse field – both methodologically and technologically –, so, it is not unsurprising to see independent researchers or projects to work on the same data, and producing complementary, but technically incompatible electronic editions from the same source material. We suggest that existing Linguistic Linked Open Data (LLOD) technology can play a crucial role for performing a post-hoc consolidation of their efforts, illustrated for the Old Saxon (Old Low German) Heliand, a 9th c. gospel harmony previously annotated for different aspects of syntax in three independent research projects and over different versions (editions and manuscripts) of the original text. We describe the derivation of a UD-compliant corpus from the consolidation of the existing annotations. This includes the transformation of the original annotations to corpus-specific CoNLL (TSV) formats, the alignment between the different corpora, and their integration. A particular challenge is the processing of incomplete annotations, as one of the source corpora (Heliand B4) provides non-recursive nominal and clausal chunks only, and another corpus (Heliand DDD) even only sentence boundaries, clause types and parts of speech, but no actual phrasal structures. In this paper, we specifically focus on the application of Fintan (CoNLL-RDF) and SPARQL for performing the necessary graph rewriting operations.

Details

Paper ID
lrec2026-ws-ldl-04
Pages
pp. 29-39
BibKey
chiarcos-etal-2026-consolidating
Editors
John P. McCrae, Katerina Gkirtzou, Fahad Khan, Patricia Martin Chozas, Sara Carvalho, Erin Canning
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of 10th Workshop on Linked Data in Linguistics (LDL-2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • CC

    Christian Chiarcos

  • JS

    Janine Siewert

Links