Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2000-main-059

Extraction of Unknown Words Using the Probability of Accepting the Kanji Character Sequence as One Word

Paper Fields

Click the edit button next to a field to report a correction.

Title

Extraction of Unknown Words Using the Probability of Accepting the Kanji Character Sequence as One Word

Abstract

In this paper, we propose a method to extract unknown words, which are composed of two or three kahji characters, from Japanase text. Generally the known word composed of kanji characters are segmented into other words by the morphological analysis. Moreover, the appearance probability of each segmented word is small. By these features, we can define the measure of accepting two or three kanji character sequence as an unknown word. On the other hand, we can find some segmentation patterns of unknown words. By applying our measure to kanji character sequences which have these patterns, we can extract unknown words. In the experiment, the F-measuer for extraction of known words composed of two and three kanji characters was about 0.7 and 0.4 respectively. Our method does not need to use the frequency of the word in the training corpus to judge whether its word is the unknown word or not. Therefore, our method has the advantage that low frequent unknown words are extracted.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.