Derivation in the Czech National Corpus

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

Abstract

The aim of this paper is to describe one the main means of Czech word formation - derivation. New Czech words are created by composition or by derivation (by using prefixes or suffixes). The suffixes which are added to the stem are used much more frequently than prefixes standing before the stem. The most frequent suffixes will be classified according to the paradigmatic and semantic properties and according to the changes they cause in the stem. The research is done on the Czech national corpus (CNC), the frequencies of the investigated suffixes illustrate their roductivity in present day Czech language. This research is of a particular value for a highly inflected language such as Czech. Possible applications of this system are various NLP systems, e.g. spelling checkers and machine translation systems. The results of this work serve for the computational processing of Czech word formation and in future for the creation of the Czech derivational dictionary.