A Corpus of Machine Translation Errors Extracted from Translation Students Exercises

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

Abstract

In this paper, we present a freely available corpus of automatic translations accompanied with post-edited versions, annotated with labels identifying the different kinds of errors made by the MT system. These data have been extracted from translation students exercises that have been corrected by a senior professor. This corpus can be useful for training quality estimation tools and for analyzing the types of errors made MT system.