Back to Main Conference 2002
LREC 2002main
Automatic Ranking of MT Systems
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
In earlier work, we succeeded in automatically predicting the relative rankings of MT systems derived from human judgments on the Fluency, Adequacy or Informativeness of their output. In this paper, we present an experiment - using human evaluators and additional data - designed to test the robustness of our earlier results. These had yielded two promising automatically computable predictors, the D-score based on semantic features of the MT output, and the X-score based on syntactic features. We conclude that the X-score is indeed a robust and reliable predictor, even on new data for which it has not been specifically tuned.