Phrase Level Segmentation and Labelling of Machine Translation Errors

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.

Resources

Details

Paper ID

lrec2016-main-356

Pages

pp. 2240-2245

DOI

10.63317/28j85hxstppn

BibKey

blain-etal-2016-phrase

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

FB
Frédéric Blain
VL
Varvara Logacheva
LS
Lucia Specia

Links

URL

DOI