Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.

Resources

Details

Paper ID

lrec2016-main-355

Pages

pp. 2232-2239

DOI

10.63317/488evu67ez4c

BibKey

liyanapathirana-popescu-belis-2016-using

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

JL
Jeevanthi Liyanapathirana
AP
Andrei Popescu-Belis

Links

URL

DOI