Consolidating Syntactically Annotated Corpora with LLOD Technology. An Experiment in the Old Saxon Heliand
Proceedings of 10th Workshop on Linked Data in Linguistics (LDL-2026)
Abstract
The humanities are a vast and highly diverse field – both methodologically and technologically –, so, it is not unsurprising to see independent researchers or projects to work on the same data, and producing complementary, but technically incompatible electronic editions from the same source material. We suggest that existing Linguistic Linked Open Data (LLOD) technology can play a crucial role for performing a post-hoc consolidation of their efforts, illustrated for the Old Saxon (Old Low German) Heliand, a 9th c. gospel harmony previously annotated for different aspects of syntax in three independent research projects and over different versions (editions and manuscripts) of the original text. We describe the derivation of a UD-compliant corpus from the consolidation of the existing annotations. This includes the transformation of the original annotations to corpus-specific CoNLL (TSV) formats, the alignment between the different corpora, and their integration. A particular challenge is the processing of incomplete annotations, as one of the source corpora (Heliand B4) provides non-recursive nominal and clausal chunks only, and another corpus (Heliand DDD) even only sentence boundaries, clause types and parts of speech, but no actual phrasal structures. In this paper, we specifically focus on the application of Fintan (CoNLL-RDF) and SPARQL for performing the necessary graph rewriting operations.