Back to Main Conference 2012
LREC 2012main

Re-ordering Source Sentences for SMT

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/25ch7ccfx98p

Abstract

We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are re-ordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation's relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.

Details

Paper ID
lrec2012-main-163
Pages
pp. 2164-2171
BibKey
sangodkar-damani-2012-ordering
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • AS

    Amit Sangodkar

  • OD

    Om Damani

Links