Back to Main Conference 2016
LREC 2016main

Government Domain Named Entity Recognition for South African Languages

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5mcyozqscxag

Abstract

This paper describes the named entity language resources developed as part of a development project for the South African languages. The development efforts focused on creating protocols and annotated data sets with at least 15,000 annotated named entity tokens for ten of the official South African languages. The description of the protocols and annotated data sets provide an overview of the problems encountered during the annotation of the data sets. Based on these annotated data sets, CRF named entity recognition systems are developed that leverage existing linguistic resources. The newly created named entity recognisers are evaluated, with F-scores of between 0.64 and 0.77, and error analysis is performed to identify possible avenues for improving the quality of the systems.

Details

Paper ID
lrec2016-main-533
Pages
pp. 3344-3348
BibKey
eiselen-2016-government
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • RE

    Roald Eiselen

Links