Back to Main Conference 2016
LREC 2016main

South African Language Resources: Phrase Chunking

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4rp4j9t29kku

Abstract

Phrase chunking remains an important natural language processing (NLP) technique for intermediate syntactic processing. This paper describes the development of protocols, annotated phrase chunking data sets and automatic phrase chunkers for ten South African languages. Various problems with adapting the existing annotation protocols of English are discussed as well as an overview of the annotated data sets. Based on the annotated sets, CRF-based phrase chunkers are created and tested with a combination of different features, including part of speech tags and character n-grams. The results of the phrase chunking evaluation show that disjunctively written languages can achieve notably better results for phrase chunking with a limited data set than conjunctive languages, but that the addition of character n-grams improve the results for conjunctive languages.

Details

Paper ID
lrec2016-main-109
Pages
pp. 689-693
BibKey
eiselen-2016-south
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • RE

    Roald Eiselen

Links