Back to Main Conference 2002
LREC 2002main

Expanding lexicons by inducing paradigms and validating attested forms

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/4uss54c5fpo5

Abstract

One of the bottlenecks in Natural Language Processing for a given language is creating a lexicon that covers the language. The morphological lexicon provides two important pieces of information for NLP applications: 1) the normalization of a word, its lemmatization, which allows the application to recognize two variants of the same word; and 2) the part-of-speech roles that the word can play, which allows the application to parse the text, creating relations between the words in a text. Many NLP applications, e.g. Information Retrieval, Classification, Terminology Extraction, etc., depend upon the normalization and parsing information found in lexicons. When words are not present in these lexicons, it is difficult to predict what their proper lemmatizations and  parts-of-speech are. In this paper we present a technique for updating a lexicon given an unknown word via induction of paradigms from an existing, but incomplete, lexicon and validation of the paradigm using corpus evidence.

Details

Paper ID
lrec2002-main-133
Pages
N/A
BibKey
grefenstette-etal-2002-expanding
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • GG

    Gregory Grefenstette

  • YQ

    Yan Qu

  • DE

    David A. Evans

Links