A Fine-Grained Evaluation Method for Speech-to-Speech Machine Translation Using Concept Annotations

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

Abstract

In this paper we report on a method of evaluating spoken language translation systems that builds upon a task-based evaluation method developed by CMU, but rather than relying on a predefined database of Interchange Format representations of spoken utterances, instead relies on a set of explicitly defined conventions for creating these interlingual representations. Our method also departs from CMU's in its scoring conventions in using a finer-grained approach to scoring (especially scoring of predicates). We have attempted to validate the legitimacy of this approach to speech-to-speech MT evaluation by looking for a relationship between the scores generated by this method, and the scores generated by a series of experiments using na•ve human judgements of the meaning and quality of MT systems' output.