Back to Main Conference 2026
LREC 2026main

From Semi-Digital Edition to Historical NLP Resource:Constructing and Annotating Historical Multilingual Parallel Text Collections on the TEITOK Platform

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2okpwwaemhsn

Abstract

We construct a multilingual, parallelized digital collection comprising a reconstructed Old Greek text from the 4th century CE and its seven historical versions, modern editions, and translations. We describe the workflow and integrated tools on the TEITOK web-based platform for ingesting, aligning, parallelizing and morphosyntactically annotating these materials. Textual alignment is performed on both the sentence and word level, after which the data are annotated with dependency parses in the Universal Dependencies paradigm. The newly created and manually post-corrected collection can be explored via advanced parallel search functionalities and flexible visualization modes. This workflow is meant to provide support for digital humanities and historical NLP projects via transforming the input texts into parallel NLP resources, enabling cross-fertilization and new insights by multiple research communities.

Details

Paper ID
lrec2026-main-120
Pages
pp. 1553-1561
BibKey
janssen-etal-2026-semi
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MJ

    Maarten Janssen

  • AJ

    Anna Jouravel

  • PL

    Piroska Lendvai

Links