Machine Translation Evaluation:
Human Evaluators Meet Automated Metrics


Motivation and Aims

The Evaluation Working group of the ISLE project has organised a series of workshops on MT evaluation. Each of these workshops has contained a practical component, where participants have been asked to carry out exercises involving MT evaluation. These workshops proved to be very illuminating, and have stimulated on-going work in the area, much of it reported in the latest workshop in the series, held at the MT Summit meeting in September of 2001.

Results from previous workshops can be consulted at www.issco.unige.ch/projects/isle/ewg.htm and the proceedings from the MT Summit in Santiago de Compostela can be requested from the organisers.

The workshop at LREC will continue the series, and will consist primarily of hands-on exercises defined to investigate empirically a small number of metrics proposed for evaluation of MT systems and the potential relationships between them.

In an effort to develop a more systematic MT evaluation methodology, recent work in the EAGLES and ISLE projects, funded by the EU and NSF, has created a framework of characteristics in terms of which MT evaluations and systems, past and future, can be described and classified. The resulting taxonomy can be consulted at http://issco-www.unige.ch/projects/isle/taxonomy2/.

Previous workshops have led to critical analysis of measures drawn from the literature, and to the creation of new measures. Of the latter, several are aimed at eventual automation of the evaluation task and/or at finding relatively simple and inexpensive measures which correlate well with more complex measures that are hard to automate or expensive to implement.

Given this background, the time has come to concentrate on systematizing the actual evaluation measures themselves. For any particular measure, one would like to know how accurate it is, how expensive and/or difficult to apply, how independent of other measures, etc. Very little of this type of information is available to date.

This workshop will focus on these issues. The organizers will provide the participants in advance with the materials required to:

The participants will then apply these measures to the data made available, and bring their results to the workshop in order to integrate them with other participants' results.

The overall intention of the workshop is to discover, empirically, what kinds of characteristics are easily determinable, and how accurate they actually are. Only through a process of assessing the evaluations can we eventually arrive at a small but accurate set of measures that adequately cover the set of phenomena MT system evaluators, system developers, and potential MT users care about.

It is our hope that participants will feel inspired to continue this process, so that the combined results can be assembled later, integrated into the framework, and become a valuable resource to anyone interested in MT evaluation.

Organizing Committee

Marianne Dabbadie EVALING, Paris (France)
Tony Hartley Centre for Translation Studies, University of Leeds (UK)
Eduard Hovy USC Information Sciences Institute, Marina del Rey (USA)
Margaret King ISSCO/TIM/ETI, University of Geneva (Switzerland)
Bente Maegaard Center for Sprogteknologi, Copenhagen (Denmark)
Sandra Manzi ISSCO/TIM/ETI, University of Geneva (Switzerland)
Keith J. Miller The MITRE Corporation (USA)
Widad Mustafa El Hadi Université Lille III - Charles de Gaulle (France)
Andrei Popescu-Belis ISSCO/TIM/ETI, University of Geneva (Switzerland)
Florence Reeder The MITRE Corporation (USA)
Michelle Vanni U.S. Department of Defense (USA)

Participation

Participants wishing to receive preparatory data should send the the following information to contact person below:

Contact

Andrei Popescu-Belis
ISSCO/TIM/ETI, University of Geneva
40, bd du Pont d'Arve
CH-1211 Geneva 4 - SWITZERLAND
Email (preferred): andrei.popescu-belis@issco.unige.ch
Fax: (41 22) 705 86 86

Important Dates

Registration with the workshop organizers 20th February 2002
Distribution of pre-workshop material March 2002
Workshop 27 May 2002

Preliminary Schedule

Morning session
  • Introduction and welcome
  • Background and workshop themes
  • integration of evaluation exercises (start)
09:00 to 13:00 Afternoon session
  • integration of evaluation exercise (continue)
  • reports
  • cross-evaluation analysis
  • final wrap-up
14:30 to 18:30

Workshop Registration Fees

The registration fees for the workshop are: