Back to Main Conference 2026
LREC 2026main

OTA-BOUN: A Historical Turkish Dependency Treebank

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/3d985kzhy84r

Abstract

We present OTA-BOUN v2.0, the largest Universal Dependencies treebank for historical Turkish, consisting of 1,742 manually verified sentences sampled from late Ottoman texts. The annotation process followed a semi-automatic methodology: initial pre-annotation by the UDPipe 2.0 pipeline was refined through manual annotation of dependency relations, part-of-speech tags, and lemmas. A distinctive feature of OTA-BOUN is its dual-script representation: each sentence is provided both in the original Perso-Arabic script and its Latinized transcription, while tokens include aligned forms in both scripts. This dual-layer design enables research on script conversion, cross-lingual transfer, and historical–modern Turkish comparisons. Through detailed analyses on the aforementioned treebank, this study presents a unique and scalable resource, advancing computational studies of historical Turkish and supporting broader efforts in multilingual and diachronic NLP.

Details

Paper ID
lrec2026-main-551
Pages
pp. 6929-6938
BibKey
tra-etal-2026-ota
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • TT

    Tarık Emre Tıraş

  • Nureddin Cüneyd Ünal

  • AC

    Ada Cengiz

  • EY

    Ece Yurtseven

  • ET

    Esma F. Bilgin Taşdemir

  • SO

    Saziye Betul Ozates

Links