Back to Main Conference 2018
LREC 2018main

Enriching a Lexicon of Discourse Connectives with Corpus-based Data

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4swxgpd4mt4w

Abstract

We present the results of the effort of enriching the pre-existing resource LICO, a Lexicon of Italian COnnectives retrieved from lexicographic sources (Feltracco et al., 2016), with real corpus data for connectives marking contrast relations in text. The motivation beyond our effort is that connectives can only be interpreted when they appear in context, that is, in a relation between the two fragments of text that constitute the two arguments of the relation. In this perspective, adding corpus examples annotated with connectives and arguments for the relation allows us to both extend the resource and validate the lexicon. In order to retrieve good corpus examples, we take advantage of the existing Contrast-Ita Bank (Feltracco et al., 2017), a corpus of news annotated with explicit and implicit discourse contrast relations for Italian according to the annotation scheme proposed in the Penn Discourse Tree Bank (PDTB) guidelines (Prasad et al., 2007). We also use an extended -non contrast annotated- version of the same corpus and documents from Wikipedia. The resulting resource represents a valuable tool for both linguistic analyses of discourse relations and the training of a classifier for NLP applications.

Details

Paper ID
lrec2018-main-684
Pages
N/A
BibKey
feltracco-etal-2018-enriching
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • AF

    Anna Feltracco

  • EJ

    Elisabetta Jezek

  • BM

    Bernardo Magnini

Links