Back to Main Conference 2002
LREC 2002main

Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/36zzdrcxg7d8

Abstract

This paper reports the details of a Japanese word sense tagged corpus developed as an evaluation data for SENSEVAL-2 Japanese dictionary task. The corpus made up of 2,130 newspaper articles. Not all but only 10,000 words in the articles were manually annotated with sense IDs, which was used as a gold standard data. Word senses were defined according to the Iwanami Kokugo Jiten, a Japanese dictionary published by Iwanami Shoten. Two annotators chose a sense ID for each instance separately. If they did not agree, the third annotator chose the correct sense ID between them. Inter-tagger agreement and Cohen's \kappa was 86.3% and 0.677, respectively.

Details

Paper ID
lrec2002-main-150
Pages
N/A
BibKey
shirai-2002-construction
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • KS

    Kiyoaki Shirai

Links