Back to Main Conference 2006
LREC 2006main

POS-based Word Reorderings for Statistical Machine Translation

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4mcaig9njr5d

Abstract

Translation In this work we investigate new possibilities for improving the quality of statistical machine translation (SMT) by applying word reorderings of the source language sentences based on Part-of-Speech tags. Results are presented on the European Parliament corpus containing about 700k sentences and 15M running words. In order to investigate sparse training data scenarios, we also report results obtained on about 1\% of the original corpus. The source languages are Spanish and English and target languages are Spanish, English and German. We propose two types of reorderings depending on the language pair and the translation direction: local reorderings of nouns and adjectives for translation from and into Spanish and long-range reorderings of verbs for translation into German. For our best translation system, we achieve up to 2\% relative reduction of WER and up to 7\% relative increase of BLEU score. Improvements can be seen both on the reordered sentences as well as on the rest of the test corpus. Local reorderings are especially important for the translation systems trained on the small corpus whereas long-range reorderings are more effective for the larger corpus.

Details

Paper ID
lrec2006-main-241
Pages
N/A
BibKey
popovic-ney-2006-pos
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • MP

    Maja Popović

  • HN

    Hermann Ney

Links