Back to Main Conference 2000
LREC 2000main

Derivation in the Czech National Corpus

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4b6tev9g77t8

Abstract

The aim of this paper is to describe one the main means of Czech word formation - derivation. New Czech words are created by composition or by derivation (by using prefixes or suffixes). The suffixes which are added to the stem are used much more frequently than prefixes standing before the stem. The most frequent suffixes will be classified according to the paradigmatic and semantic properties and according to the changes they cause in the stem. The research is done on the Czech national corpus (CNC), the frequencies of the investigated suffixes illustrate their roductivity in present day Czech language. This research is of a particular value for a highly inflected language such as Czech. Possible applications of this system are various NLP systems, e.g. spelling checkers and machine translation systems. The results of this work serve for the computational processing of Czech word formation and in future for the creation of the Czech derivational dictionary.

Details

Paper ID
lrec2000-main-112
Pages
N/A
BibKey
klimova-kocek-2000-derivation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • JK

    Jana Klímová

  • JK

    Jan Kocek

Links