Back to Main Conference 2008
LREC 2008main

Acquiring a Poor Man’s Inflectional Lexicon for German

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/37o2pnze63v6

Abstract

Many NLP modules and applications require the availability of a module for wide-coverage inflectional analysis. One way to obtain such analyses is to use a morphological analyser in combination with an inflectional lexicon. Since large text corpora nowadays are easily available and inflectional systems are in general well understood, it seems feasible to acquire lexical data from raw texts, guided by our knowledge of inflection. I present an acquisition method along these lines for German. The general idea can be roughly summarised as follows: first, generate a set of lexical entry hypotheses for each word-form in the corpus; then, select hypotheses that explain the word-forms found in the corpus “best”. To this end, I have turned an existing morphological grammar, cast in finite-state technology (Schmid et al. 2004), into a hypothesiser for lexical entries. Irregular forms are simply listed so that they do not interfere with the regular rules used in the hypothesiser. Running the hypothesiser on a text corpus yields a large number of lexical entry hypotheses. These are then ranked according to their validity with the help of a statistical model that is based on the number of attested and predicted word forms for each hypothesis.

Details

Paper ID
lrec2008-main-444
Pages
N/A
BibKey
adolphs-2008-acquiring
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • PA

    Peter Adolphs

Links