Back to Main Conference 2012
LREC 2012main

Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/4zhcprak6g53

Abstract

This paper describes methods and results for the annotation of two discourse-level phenomena, connectives and pronouns, over a multilingual parallel corpus. Excerpts from Europarl in English and French have been annotated with disambiguation information for connectives and pronouns, for about 3600 tokens. This data is then used in several ways: for cross-linguistic studies, for training automatic disambiguation software, and ultimately for training and testing discourse-aware statistical machine translation systems. The paper presents the annotation procedures and their results in detail, and overviews the first systems trained on the annotated resources and their use for machine translation.

Details

Paper ID
lrec2012-main-103
Pages
pp. 2716-2720
BibKey
popescu-belis-etal-2012-discourse
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • AP

    Andrei Popescu-Belis

  • TM

    Thomas Meyer

  • JL

    Jeevanthi Liyanapathirana

  • BC

    Bruno Cartoni

  • SZ

    Sandrine Zufferey

Links