Back to Main Conference 2000
LREC 2000main
Some Technical Aspects about Aligning Near Languages
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)
Abstract
IULA at UPF has developed an aligner that benefits from corpus processing results to produce an accurate and robust alignment, even with noisy parallel corpora. It compares lemmata and part-of-speech tags of analysed texts but it has two main characteristics. First, apparently it only works for near languages and second it requires morphological taggers for the compared languages. These two characteristics prevent this technique from being used for any pair of languages. Whevener it its applicable, a high quality of results is achieved.