Back to Main Conference 2000
LREC 2000main

A Parallel Corpus of Italian/German Legal Texts

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/495pouo2t8ug

Abstract

This paper presents the creation of a parallel corpus of Italian and German legal documents which are translations of one another. The corpus, which contains approximately 5 mio. words, is primarily intended as a resource for (semi-)automatic terminology acquisition. The guidelines of the Corpus Encoding Standard have been applied for encoding structural information, segmentation information, and sentence alignment. Since the parallel texts have a one-to-one correspondence on the sentence level, building a perfect sentence alignment is rather straightforward. As a result of this the corpus constitutes also a valuable testbed for the evaluation of alignment algorithms. The paper discusses the intended use of the corpus, the various phases of corpus compilation, and basic statistics.

Details

Paper ID
lrec2000-main-104
Pages
N/A
BibKey
gamper-2000-parallel
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • JG

    Johann Gamper

Links