Back to Main Conference 2008
LREC 2008main

Portuguese-English Word Alignment: some Experiments

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3rme8qw5ifb9

Abstract

In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessing the relevance of using syntactic information (POS and lemma) or just word forms, and (iii) taking into account the direction of translation. We first provide some motivation for the studies, as well as insist in separating type from token anlignment. We then briefly describe the resources employed: the EuroParl and COMPARA corpora, and the alignment tools, NATools, introducing some measures to evaluate the two kinds of dictionaries obtained. We then present the results of several experiments, comparing sizes, overlap, translation fertility and alignment density of the several bilingual resources built. We also describe preliminary data as far as quality of the resulting dictionaries or alignment results is concerned.

Details

Paper ID
lrec2008-main-364
Pages
N/A
BibKey
santos-simoes-2008-portuguese
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • DS

    Diana Santos

  • AS

    Alberto Simões

Links