HomeLREC 2026WorkshopsUDWlrec2026-ws-udw-10
Back to UDW 2026
LREC 2026workshop

Word segmentation for UD: a comparison of isiZulu and Sepedi

Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)

DOI:10.63317/35ihcv2cfbzq

Abstract

The Southern Bantu language family contains languages with so-called conjunctive orthographies and disjunctive orthographies. In languages with conjunctive orthographies, such as isiZulu, orthographic words correspond to linguistic words, whereas in languages with disjunctive orthographies, prefix morphemes of verbs and other predicates are written as disjunct, orthographic words. When developing Universal Dependencies treebanks, the basic principle is to consider syntactic (linguistic) words, but for languages with agglutinating morphology, it has been argued that this reduces the informativeness of the treebank. In this paper we investigate this claim by analysing and measuring the effects of annotating universal dependencies on the basis of orthographic words on two morphosyntactically parallel treebanks for isiZulu and Sepedi.

Details

Paper ID
lrec2026-ws-udw-10
Pages
pp. 116-127
BibKey
marais-etal-2026-word
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Ninth Workshop on Universal Dependencies (UDW 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • LM

    Laurette Marais

  • LP

    Laurette Pretorius

Links