Back to Main Conference 2004
LREC 2004main

Prague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/4q48htam6tur

Abstract

This paper introduces the Prague Czech-English Dependency Treebank (PCEDT), a new Czech-English parallel resource suitable for experiments in structural machine translation. We describe the process of building the core parts of the resources - a bilingual syntactically annotated corpus and translation dictionaries. A part of the Penn Treebank has been translated to Czech and its annotation tranformed into dependency annotation scheme. The annotation of Czech was done automatically from plain text. A subset of corresponding Czech and English sentences has been annotated by humans. The resources being created at Charles University in Prague are scheduled for release as Linguistic Data Consortium data collection in 2004. First experiments in Czech-English machine translation using these data were already carried out.

Details

Paper ID
lrec2004-main-481
Pages
N/A
BibKey
cmejrek-etal-2004-prague
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • Martin Čmejrek

  • JC

    Jan Cuřín

  • JH

    Jiří Havelka

  • JH

    Jan Hajič

  • VK

    Vladislav Kuboň

Links