Back to Main Conference 2016
LREC 2016main
Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Abstract
Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improvements for both accuracy in sentence extraction and MT performance.