Back to Main Conference 2018
LREC 2018main

Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/2fsnu3dm2m7w

Abstract

The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.

Details

Paper ID
lrec2018-main-075
Pages
N/A
BibKey
sosoni-etal-2018-translation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • VS

    Vilelmini Sosoni

  • KK

    Katia Lida Kermanidis

  • MS

    Maria Stasimioti

  • TN

    Thanasis Naskos

  • ET

    Eirini Takoulidou

  • Mv

    Menno van Zaanen

  • SC

    Sheila Castilho

  • PG

    Panayota Georgakopoulou

  • VK

    Valia Kordoni

  • ME

    Markus Egg

Links