LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Automatic Style Categorisation of Corpora in the Greek Language
Authors Tambouratzis George (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, giorg_t@ilsp.gr)
Markantonatou Stella (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, marks@ilsp.gr)
Hairetakis Nikolaos (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, nhaire@ilsp.gr)
Carayannis George (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, gcara@ilsp.gr)
Keywords Automated Style Categorisation, Grammatical Rules, Greek Language, Masking-and-Matching Technique, Morphological Processing
Session Session WO3 - Corpus Categorisation
Full Paper 301.ps, 301.pdf
Abstract In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature of the Greek language is exploited. For each text, a vector of both structural and morphological characteristics is assembled. Categorisation is achieved by comparing this vector to given archetypes using a statistical-based method. Experimental resu