Title |
A Parallel Corpus of Italian/German Legal Texts |
Authors |
Gamper Johann (European Academy Bolzano, Scientific Area “Language and Law”, Weggensteinstr. 12a, 39100 Bozen, Italy, jgamper@eurac.edu) |
Keywords |
CES, Corpus Encoding, Parallel Corpus |
Session |
Session WP3 - Multilingual Corpora |
Full Paper |
140.ps, 140.pdf |
Abstract |
This paper presents the creation of a parallel corpus of Italian and German legal documents which are translations of one another. The corpus, which contains approximately 5 mio. words, is primarily intended as a resource for (semi-)automatic terminology acquisition. The guidelines of the Corpus Encoding Standard have been applied for encoding structural information, segmentation information, and sentence alignment. Since the parallel texts have a one-to-one correspondence on the sentence level, building a perfect sentence alignment is rather straightforward. As a result of this the corpus constitutes also a valuable testbed for the evaluation of alignment algorithms. The paper discusses the intended use of the corpus, the various phases of corpus compilation, and basic statistics. |