Back to Main Conference 2012
LREC 2012main

Multi-Layer Discourse Annotation of a Dutch Text Corpus

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/33se4246reuz

Abstract

We have compiled a corpus of 80 Dutch texts from expository and persuasive genres, which we annotated for rhetorical and genre-specific discourse structure, and lexical cohesion with the goal of creating a gold standard for further research. The annota¬tions are based on a segmentation of the text in elementary discourse units that takes into account cues from syntax and punctuation. During the labor-intensive discourse-structure annotation (RST analysis), we took great care to thoroughly reconcile the initial analyses. That process and the availability of two independent initial analyses for each text allows us to analyze our disagreements and to assess the confusability of RST relations, and thereby improve the annotation guidelines and gather evidence for the classification of these relations into larger groups. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences, e.g., the question of how discourse structure and lexical cohesion interact for different genres in the overall organization of texts. We are also exploring automatic text segmentation and semi-automatic discourse annotation.

Details

Paper ID
lrec2012-main-528
Pages
pp. 2820-2825
BibKey
redeker-etal-2012-multi
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • GR

    Gisela Redeker

  • IB

    Ildikó Berzlánovich

  • Nv

    Nynke van der Vliet

  • GB

    Gosse Bouma

  • ME

    Markus Egg

Links