Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

Abstract

During recent years the development of high-quality lexical resources for real-world Natural Language Processing (NLP) applications has gained a lot of attention by many research groups around the world, and the European Union, through the promotion of the language engineering projects dealing directly or indirectly with this topic. In this paper, we focus on ways to extend and enrich such a resource, namely the Swedish version of the SIMPLE lexicon in an automatic manner. The SIMPLE project ({\it Semantic Information for Multifunctional Plurilingual Lexica}) aims at developing wide-coverage semantic lexicons for 12 European languages, though on a rather small scale for practical NLP, namely less than 10,000 entries. Consequently, our intention is to explore and exploit various (inexpensive) methods to progressively enrich the resources and, subsequently, to annotate texts with the semantic information encoded within the framework of SIMPLE, and enhanced with the semantic data from the {\it Gothenburg Lexical DataBase} (GLDB) and from large corpora.