Back to Main Conference 2002
LREC 2002main

Enhanced Japanese Electronic Dictionary Look-up

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/38q4x7vp3b6f

Abstract

This paper describes the process of data preparation and reading generation for an ongoing project aimed at improving the accessibility of unknown words for learners of foreign languages, focusing initially on Japanese. Rather then requiring absolute knowledge of the readings of words in the foreign language, we allow look-up of dictionary entries by readings which learners can predictably be expected to associate with them. We automatically extract an exhaustive set of phonemic readings for each grapheme segment and learn basic morpho-phonological rules governing compound word formation, associating a probability with each. Then we apply the naive Bayes model to generate a set of readings and give each a likeliness score based on previously extracted evidence and corpus frequencies.

Details

Paper ID
lrec2002-main-313
Pages
N/A
BibKey
baldwin-etal-2002-enhanced
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • TB

    Timothy Baldwin

  • SB

    Slaven Bilac

  • RO

    Ryo Okumura

  • TT

    Takenobu Tokunaga

  • HT

    Hozumi Tanaka

Links