Back to Main Conference 2002
LREC 2002main

Use of XML and Relational Databases for Consistent Development and Maintenance of Lexicons and Annotated Corpora

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/3ouk5ojcvgbc

Abstract

In this paper, we present a use of XML and relational database for developing and maintaining Japanese linguistic resources. In languages that do not provide word delimitation in texts (e.g. Chinese and Japanese), consistent delimitation definition of words in a lexicon is a critical issue to build POS tagged corpora. When we change the definition of word delimitation in the lexicon, we need to modify the tagged corpora to make them consistent with the lexicon. We propose a use of relational database to perform these modifications in tandem. Hence, in the Japanese language, there are several standards for word delimitation definition. To accommodate more than one definition of word delimitation, we compose a compounding word lexicon in the database. The compounding word lexicon includes dependency structures of compounding words.

Details

Paper ID
lrec2002-main-191
Pages
N/A
BibKey
asahara-etal-2002-use
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • MA

    Masayuki Asahara

  • RY

    Ryuichi Yoneda

  • AY

    Akiko Yamashita

  • YD

    Yasuharu Den

  • YM

    Yuji Matsumoto

Links