Back to Main Conference 2016
LREC 2016main

Effects of Sampling on Twitter Trend Detection

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/496yiwq6afa6

Abstract

Much research has focused on detecting trends on Twitter, including health-related trends such as mentions of Influenza-like illnesses or their symptoms. The majority of this research has been conducted using Twitter's public feed, which includes only about 1% of all public tweets. It is unclear if, when, and how using Twitter's 1% feed has affected the evaluation of trend detection methods. In this work we use a larger feed to investigate the effects of sampling on Twitter trend detection. We focus on using health-related trends to estimate the prevalence of Influenza-like illnesses based on tweets. We use ground truth obtained from the CDC and Google Flu Trends to explore how the prevalence estimates degrade when moving from a 100% to a 1% sample. We find that using the 1% sample is unlikely to substantially harm ILI estimates made at the national level, but can cause poor performance when estimates are made at the city level.

Details

Paper ID
lrec2016-main-479
Pages
pp. 2998-3005
BibKey
yates-etal-2016-effects
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • AY

    Andrew Yates

  • AK

    Alek Kolcz

  • NG

    Nazli Goharian

  • OF

    Ophir Frieder

Links