Back to Main Conference 2018
LREC 2018main

PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4nmzdtvpzheh

Abstract

Due to the spread of social media-based applications and the challenges posed by the treatment of social media texts in NLP tools, tailored approaches and ad hoc resources are required to provide the proper coverage of specific linguistic phenomena. Various attempts to produce this kind of specialized resources and tools are described in literature. However, most of these attempts mainly focus on PoS-tagged corpora and only a few of them deal with syntactic annotation. This is particularly true for the Italian language, for which such a resource is currently missing. We thus propose the development of PoSTWITA-UD, a collection of tweets annotated according to a well-known dependency-based annotation format: the Universal Dependencies. The goal of this work is manifold, and it mainly consists in creating a resource that, especially for Italian, can be exploited for the training of NLP systems so as to enhance their performance on social media texts. In this paper we focus on the current state of the resource.

Details

Paper ID
lrec2018-main-279
Pages
N/A
BibKey
sanguinetti-etal-2018-postwita
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • MS

    Manuela Sanguinetti

  • CB

    Cristina Bosco

  • AL

    Alberto Lavelli

  • AM

    Alessandro Mazzei

  • OA

    Oronzo Antonelli

  • FT

    Fabio Tamburini

Links