Back to Main Conference 2010
LREC 2010main

Online Japanese Unknown Morpheme Detection using Orthographic Variation

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/2tzomuj3tzzb

Abstract

To solve the unknown morpheme problem in Japanese morphological analysis, we previously proposed a novel framework of online unknown morpheme acquisition and its implementation. This framework poses a previously unexplored problem, online unknown morpheme detection. Online unknown morpheme detection is a task of finding morphemes in each sentence that are not listed in a given lexicon. Unlike in English, it is a non-trivial task because Japanese does not delimit words by white space. We first present a baseline method that simply uses the output of the morphological analyzer. We then show that it fails to detect some unknown morphemes because they are over-segmented into shorter registered morphemes. To cope with this problem, we present a simple solution, the use of orthographic variation of Japanese. Under the assumption that orthographic variants behave similarly, each over-segmentation candidate is checked against its counterparts. Experiments show that the proposed method improves the recall of detection and contributes to improving unknown morpheme acquisition.

Details

Paper ID
lrec2010-main-111
Pages
N/A
BibKey
murawaki-kurohashi-2010-online
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • YM

    Yugo Murawaki

  • SK

    Sadao Kurohashi

Links