LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Some Technical Aspects about Aligning Near Languages
Authors de Yzaguirre Lluís (Institute for Applied Linguistic. Universitat Pompeu Fabra, La Rambla, 30-32. 08002, Barcelona, Spain, de_yza@upf.es)
Ribas Marta (Institute for Applied Linguistic. Universitat Pompeu Fabra, La Rambla, 30-32. 08002, Barcelona, Spain )
Vivaldi Jordi (Institute for Applied Linguistics, Universitat Pompeu Fabra, Rambla Santa Mònica, 30, 08002 Barcelona, Spain, jorge.vivaldi@info.upf.es)
Cabré M. Teresa (Institute for Applied Linguistics, Universitat Pompeu Fabra, Rambla Santa Mònica, 30, 08002 Barcelona, Spain, teresa.cabre@trad.upf.es)
Keywords Lemma and Part-of-Speech Based Aligment, Sentence Aligment
Session Session WP3 - Multilingual Corpora
Full Paper 186.ps, 186.pdf
Abstract IULA at UPF has developed an aligner that benefits from corpus processing results to produce an accurate and robust alignment, even with noisy parallel corpora. It compares lemmata and part-of-speech tags of analysed texts but it has two main characteristics. First, apparently it only works for near languages and second it requires morphological taggers for the compared languages. These two characteristics prevent this technique from being used for any pair of languages. Whevener it its applicable, a high quality of results is achieved.