Back to Main Conference 2006
LREC 2006main
Parallel Corpora and Phrase-Based Statistical Machine Translation for New Language Pairs via Multiple Intermediaries
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)
Abstract
We present a large parallel corpus of texts published by the United Nations Organization, which we exploit for the creation ofphrase-based statistical machine translation (SMT) systems for new language pairs. We present a setup where phrase tables for these language pairs are used for translation between languages for which parallel corpora of sufficient size are so far not available. We give some preliminary results for this novel application of SMT and discuss further refinements.