Back to Main Conference 2014
LREC 2014main

Momresp: A Bayesian Model for Multi-Annotator Document Labeling

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/226e2iyb2ga9

Abstract

Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MomResp, a model that incorporates information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels and annotator reliability for the document classification task. We implement this model and show dramatic improvements over majority vote in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Because MomResp predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MomResp does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work.

Details

Paper ID
lrec2014-main-107
Pages
pp. 3704-3711
BibKey
felt-etal-2014-momresp
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • PF

    Paul Felt

  • RH

    Robbie Haertel

  • ER

    Eric Ringger

  • KS

    Kevin Seppi

Links