Back to Main Conference 2000
LREC 2000main
Automatic Style Categorisation of Corpora in the Greek Language
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)
Abstract
In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature of the Greek language is exploited. For each text, a vector of both structural and morphological characteristics is assembled. Categorisation is achieved by comparing this vector to given archetypes using a statistical-based method. Experimental resu