Back to Main Conference 2002
LREC 2002main
Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
The aim of this paper is to investigate the major quantitative parameters related to the definition of the optimum text size in Modern Greek corpus development. Using the Hellenic National Corpus (HNC) (Hatzigeorgiu et al., 2000) as a reference point we estimated a number of critical statistical measures regarding feature counting in different text sizes. The results indicate that frequent linguistic features behave differently from the medium frequency and the rare ones and the text size increase do not affect them uniformly.