Back to Main Conference 2014
LREC 2014main

TweetNorm_es: an annotated corpus for Spanish microtext normalization

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/2peqrmad7i37

Abstract

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

Details

Paper ID
lrec2014-main-379
Pages
pp. 2274-2278
BibKey
alegria-etal-2014-tweetnorm
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • IA

    Iñaki Alegria

  • NA

    Nora Aranberri

  • PC

    Pere Comas

  • VF

    Víctor Fresno

  • PG

    Pablo Gamallo

  • LP

    Lluis Padró

  • IS

    Iñaki San Vicente

  • JT

    Jordi Turmo

  • AZ

    Arkaitz Zubiaga

Links