Back to RAIL 2024
LREC-COLING 2024workshop

The First Universal Dependency Treebank for Tswana: Tswana-Popapolelo

Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024

DOI:10.63317/3y3hwmppsppm

Abstract

This paper presents the first publicly available UD treebank for Tswana, Tswana-Popapolelo. The data used consists of the 20 Cairo CICLing sentences translated to Tswana. After pre-processing these sentences with detailed POS (XPOS) and converting them to universal POS (UPOS), we proceeded to annotate the data with dependency relations, documenting decisions for the language specific constructions. Linguistic issues encountered are described in detail as this is the first application of the UD framework to produce a dependency treebank for the Bantu language family in general and for Tswana specifically.

Details

Paper ID
lrec2024-ws-rail-07
Pages
pp. 55-65
BibKey
gaustad-etal-2024-first
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • TG

    Tanja Gaustad

  • AB

    Ansu Berg

  • RP

    Rigardt Pretorius

  • RE

    Roald Eiselen

Links