Back to Main Conference 2016
LREC 2016main

Building an Arabic Machine Translation Post-Edited Corpus: Guidelines and Annotation

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4wiuxtfyzc54

Abstract

We present our guidelines and annotation procedure to create a human corrected machine translated post-edited corpus for the Modern Standard Arabic. Our overarching goal is to use the annotated corpus to develop automatic machine translation post-editing systems for Arabic that can be used to help accelerate the human revision process of translated texts. The creation of any manually annotated corpus usually presents many challenges. In order to address these challenges, we created comprehensive and simplified annotation guidelines which were used by a team of five annotators and one lead annotator. In order to ensure a high annotation agreement between the annotators, multiple training sessions were held and regular inter-annotator agreement measures were performed to check the annotation quality. The created corpus of manual post-edited translations of English to Arabic articles is the largest to date for this language pair.

Details

Paper ID
lrec2016-main-295
Pages
pp. 1869-1876
BibKey
zaghouani-etal-2016-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • WZ

    Wajdi Zaghouani

  • NH

    Nizar Habash

  • OO

    Ossama Obeid

  • BM

    Behrang Mohit

  • HB

    Houda Bouamor

  • KO

    Kemal Oflazer

Links