Back to Main Conference 2008
LREC 2008main

Cross-Corpus Evaluation of Word Alignment

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3zgn4di43c8g

Abstract

We present the procedures we implemented to carry out system oriented evaluation of a syntax-based word aligner, ALIBI. While cross-corpus evaluation is still relatively rare in NLP, we take the approach of regarding cross-corpus evaluation as part of system oriented evaluation. Our hypothesis is that the granularity of alignments and the level of syntactic correspondence depend on corpus type; our objective is to assess how this impacts on alignment quality. We test our system on three English-French parallel corpora. The evaluation procedures are defined in accordance with state-of-the-art word alignment evaluation principles. They include, for each corpus, the creation of a reference set containing multiple annotations of the same data, the assessment of inter-annotator agreement rates and an analysis of the reference set obtained. We show that alignment performance varies across corpora according to the multiple reference annotations produced and further motivate our choice of preserving all reference annotations without solving disagreements between annotators.

Details

Paper ID
lrec2008-main-380
Pages
N/A
BibKey
ozdowska-2008-cross
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • SO

    Sylwia Ozdowska

Links