Back to Main Conference 2010
LREC 2010main

Enhanced Infrastructure for Creation and Collection of Translation Resources

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/2gjv3vwijzef

Abstract

Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged ap-proach to parallel text corpus development (acquisition of existing parallel text from known repositories, harvesting and aligning of potential parallel documents from the web, and manual creation of parallel text by professional translators), and describe recent adap-tations that have enabled significant expansions in the scope, variety, quality, efficiency and cost-effectiveness of translation resource creation at LDC.

Details

Paper ID
lrec2010-main-549
Pages
N/A
BibKey
song-etal-2010-enhanced
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • ZS

    Zhiyi Song

  • SS

    Stephanie Strassel

  • GK

    Gary Krug

  • KM

    Kazuaki Maeda

Links