Back to Main Conference 2000
LREC 2000main

Collocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor -

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3itbw29en25f

Abstract

Collocations, the com bination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exem plifying an application to K anji character decision processes for Japanese w ord processors. U nlike recent trials of autom atic extraction, our collocations were collected m anually through many years of intensive investigation of corpus. Our collection procedure consists of (1) finding a proper com bination of words in a corpus and (2) recollecting similar com binations of words, incited by it. This procedure, which depends on hum an judgm ent and the enrichm ent of data by association, is effective for rem edying the sparseness of data problem , although the arbitrariness of hum an judgm ent is inevitable. A pproximately seventy two thousand and four hundred collocations w ere used as w ord co-occurrence restriction data for deciding K anji characters in the processing of Japanese w ord processores. Experiments have show n that the collocation data yield 8.9% higher fraction of Kana-to-Kanji character conversion accuracy than the system w hich uses no collocation data and 7.0% higher, than a com m ercial word processor software of average perform ance.

Details

Paper ID
lrec2000-main-002
Pages
N/A
BibKey
shudo-etal-2000-collocations
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • KS

    Kosho Shudo

  • MT

    Masahito Takahashi

  • YK

    Yasuo Koyama

  • KY

    Kenji Yoshimura

Links