Back to Main Conference 2010
LREC 2010main

Annotation Time Stamps — Temporal Metadata from the Linguistic Annotation Process

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/4qee7r7zw5cy

Abstract

We describe the re-annotation of selected types of named entities (persons, organizations, locations) from the Muc7 corpus. The focus of this annotation initiative is on recording the time needed for the linguistic process of named entity annotation. Annotation times are measured on two basic annotation units -- sentences vs. complex noun phrases. We gathered evidence that decision times are non-uniformly distributed over the annotation units, while they do not substantially deviate among annotators. This data seems to support the hypothesis that annotation times very much depend on the inherent ""hardness"" of each single annotation decision. We further show how such time-stamped information can be used for empirically grounded studies of selective sampling techniques, such as Active Learning. We directly compare Active Learning costs on the basis of token-based vs. time-based measurements. The data reveals that Active Learning keeps its competitive advantage over random sampling in both scenarios though the difference is less marked for the time metric than for the token metric.

Details

Paper ID
lrec2010-main-443
Pages
N/A
BibKey
tomanek-hahn-2010-annotation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • KT

    Katrin Tomanek

  • UH

    Udo Hahn

Links