Back to Main Conference 2012
LREC 2012main

Large aligned treebanks for syntax-based machine translation

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/2etw3pahb2dc

Abstract

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present evaluation scores of both the nonterminal constituent alignments and the MT system itself, and in the latter case, compare them with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.

Details

Paper ID
lrec2012-main-553
Pages
pp. 467-473
BibKey
kotze-etal-2012-large
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • GK

    Gideon Kotzé

  • VV

    Vincent Vandeghinste

  • SM

    Scott Martens

  • JT

    Jörg Tiedemann

Links