Back to Main Conference 2004
LREC 2004main

Unexpected Productions May Well be Errors

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/4jdvmp39s7oy

Abstract

We present a method for detecting annotation errors in treebanks. It assumes that errors are unexpected small tree fragments. We generate statistics over configurations of these fragments using a standard statistical test. We use the test result and the characteristics of their distributions as features to classify unseen configurations as likely errors via machine learning. Evaluation shows that the resulting list of error candidates is reliable, independent of corpus size, annotation quality, and target language.

Details

Paper ID
lrec2004-main-287
Pages
N/A
BibKey
ule-simov-2004-unexpected
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • TU

    Tylman Ule

  • KS

    Kiril Simov

Links