Back to Main Conference 2016
LREC 2016main

EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2cm8di77f76c

Abstract

Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

Details

Paper ID
lrec2016-main-655
Pages
pp. 4149-4153
BibKey
vilares-etal-2016-en
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • DV

    David Vilares

  • MA

    Miguel A. Alonso

  • CG

    Carlos Gómez-Rodríguez

Links