Back to Main Conference 2008
LREC 2008main

CzEng 0.7: Parallel Corpus with Community-Supplied Translations

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/5abt7yfyc9ph

Abstract

This paper describes CzEng 0.7, a new release of Czech-English parallel corpus freely available for research and educational purposes. We provide basic statistics of the corpus and focus on data produced by a community of volunteers. Anonymous contributors manually correct the output of a machine translation (MT) system, generating on average 2000 sentences a month, 70% of which are indeed correct translations. We compare the utility of community-supplied and of professionally translated training data for a baseline English-to-Czech MT system.

Details

Paper ID
lrec2008-main-058
Pages
N/A
BibKey
bojar-etal-2008-czeng
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • OB

    Ondřej Bojar

  • MJ

    Miroslav Janíček

  • Zdeněk Žabokrtský

  • Pavel Češka

  • PB

    Peter Beňa

Links