Achieving Domain Specificity in SMT without Overt Siloing

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

Abstract

We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.