Back to Main Conference 2006
LREC 2006main

Grammar-based tools for the creation of tagging resources for an unresourced language: the case of Northern Sotho

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4jt2hb3ncjsi

Abstract

We describe an architecture for the parallel construction of a tagger lexicon and an annotated reference corpus for the part-of-speech tagging of Nothern Sotho, a Bantu language of South Africa, for which no tagged resources have been available so far. Our tools make use of grammatical properties (morphological and syntactic) of the language. We use symbolic pretagging, followed by stochastic tagging, an architecture which proves useful not only for the bootstrapping of tagging resources, but also for the tagging of any new text. We discuss the tagset design, the tool architecture and the current state of our ongoing effort.

Details

Paper ID
lrec2006-main-220
Pages
N/A
BibKey
heid-etal-2006-grammar
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • UH

    Ulrich Heid

  • ET

    Elsabé Taljard

  • DP

    Danie J. Prinsloo

Links