DELOS: An Automatically Tagged Economic Corpus for Modern Greek
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
Text corpora resources have become an essential tool for Natural Language Processing tasks over the past years. A wide range of applications like information retrieval, ontology and terminology extraction require a sufficiently large corpus but of restricted domain. Manual tagging of such a corpus is very costly, making automatic annotation by a set of linguistic tools a very challenging idea. DELOS, described in this paper, is a Modern Greek corpus of economic domain consisting of 5 million word tokens, which is automatically tagged for morphology and shallow syntactic relations. The annotating tools described are embodied in an integrated system and their application to the corpus is performed using the GATE text engineering platform. The system output is a textual database marked up with the annotation tagset in plain text as well as in XML format.