Back to Main Conference 2000
LREC 2000main

Automatic Extraction of English-Chinese Term Lexicons from Noisy Bilingual Corpora

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3393ce7cp9db

Abstract

This paper describes our system, which is designed to extract English-Chinese term lexicons from noisy complex bilingual corpora and use them as translation lexicon to check sentence alignment results. The noisy bilingual corpora are aligned firstly by our improved length based statistical approach, which could detect sentence omission and insertion partly. A term extraction system is used to obtain term translation lexicons form roughly aligned corpora. Then the statistical approach is used to align the corpora again. Finally, we filter the noisy bilingual texts and obtain nearly perfect alignment corpora.

Details

Paper ID
lrec2000-main-156
Pages
N/A
BibKey
sun-etal-2000-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • LS

    Le Sun

  • YJ

    Youbing Jin

  • LD

    Lin Du

  • YS

    Yufang Sun

Links