Back to Main Conference 2014
LREC 2014main

Crowdsourcing and annotating NER for Twitter #drift

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3mepts35xng8

Abstract

We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can be obtained from crowdsourced annotations, making it more feasible to “catch up” with language drift.

Details

Paper ID
lrec2014-main-361
Pages
pp. 2544-2547
BibKey
fromreide-etal-2014-crowdsourcing
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • HF

    Hege Fromreide

  • DH

    Dirk Hovy

  • AS

    Anders Søgaard

Links