Back to Main Conference 2016
LREC 2016main

TweetMT: A Parallel Microblog Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/38nvi5oa5nir

Abstract

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

Details

Paper ID
lrec2016-main-469
Pages
pp. 2936-2941
BibKey
vicente-etal-2016-tweetmt
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • IV

    Iñaki San Vicente

  • IA

    Iñaki Alegría

  • CE

    Cristina España-Bonet

  • PG

    Pablo Gamallo

  • HO

    Hugo Gonçalo Oliveira

  • EG

    Eva Martínez Garcia

  • AT

    Antonio Toral

  • AZ

    Arkaitz Zubiaga

  • NA

    Nora Aranberri

Links