Back to Main Conference 2008
LREC 2008main

Approximating Learning Curves for Active-Learning-Driven Annotation

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/4smeu6r7yan5

Abstract

Active learning (AL) is getting more and more popular as a methodology to considerably reduce the annotation effort when building training material for statistical learning methods for various NLP tasks. A crucial issue rarely addressed, however, is when to actually stop the annotation process to profit from the savings in efforts. This question is tightly related to estimating the classifier performance after a certain amount of data has already been annotated. While learning curves are the default means to monitor the progress of the annotation process in terms of classifier performance, this requires a labeled gold standard which - in realistic annotation settings, at least - is often unavailable. We here propose a method for committee-based AL to approximate the progression of the learning curve based on the disagreement among the committee members. This method relies on a separate, unlabeled corpus and is thus well suited for situations where a labeled gold standard is not available or would be too expensive to obtain. Considering named entity recognition as a test case we provide empirical evidence that this approach works well under simulation as well as under real-world annotation conditions.

Details

Paper ID
lrec2008-main-208
Pages
N/A
BibKey
tomanek-hahn-2008-approximating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • KT

    Katrin Tomanek

  • UH

    Udo Hahn

Links