Back to Main Conference 2004
LREC 2004main

Morphology Based Automatic Acquisition of Large-coverage Lexica

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/3g5qu9artced

Abstract

In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. We first validated our technique by extracting verbs and adjectives on a general French corpus of 25 million words. Compared with other lexical resources available for French, our results are very satisfying, since we cover many words, often derived words, that are not always present in other lexica. Application of our algorithm to the acquisition of domain-specific adjectives on a botanic corpus gave also very good results, thus demonstrating its usability to extract domain-specific lexica. Moreover, it is generalizable to any language with a substantial morphology. Part of the resulting lexicon (currently verbal forms) is already freely available on http://www.lefff.net/.

Details

Paper ID
lrec2004-main-457
Pages
N/A
BibKey
clement-etal-2004-morphology
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • LC

    Lionel Clément

  • BS

    Benoît Sagot

  • BL

    Bernard Lang

Links