Back to Main Conference 2014
LREC 2014main

Potsdam Commentary Corpus 2.0: Annotation for Discourse Research

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3a9ax86c94ai

Abstract

We present a revised and extended version of the Potsdam Commentary Corpus, a collection of 175 German newspaper commentaries (op-ed pieces) that has been annotated with syntax trees and three layers of discourse-level information: nominal coreference,connectives and their arguments (similar to the PDTB, Prasad et al. 2008), and trees reflecting discourse structure according to Rhetorical Structure Theory (Mann/Thompson 1988). Connectives have been annotated with the help of a semi-automatic tool, Conano (Stede/Heintze 2004), which identifies most connectives and suggests arguments based on their syntactic category. The other layers have been created manually with dedicated annotation tools. The corpus is made available on the one hand as a set of original XML files produced with the annotation tools, based on identical tokenization. On the other hand, it is distributed together with the open-source linguistic database ANNIS3 (Chiarcos et al. 2008; Zeldes et al. 2009), which provides multi-layer search functionality and layer-specific visualization modules. This allows for comfortable qualitative evaluation of the correlations between annotation layers.

Details

Paper ID
lrec2014-main-468
Pages
pp. 925-929
BibKey
stede-neumann-2014-potsdam
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • MS

    Manfred Stede

  • AN

    Arne Neumann

Links