Back to Main Conference 2002
LREC 2002main

DELOS: An Automatically Tagged Economic Corpus for Modern Greek

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/2g6d7oi9rn9j

Abstract

Text corpora resources have become an essential tool for Natural Language Processing tasks over the past years. A wide range of applications like information retrieval, ontology and terminology extraction require a sufficiently large corpus but of restricted domain. Manual tagging of such a corpus is very costly, making automatic annotation by a set of linguistic tools a very challenging idea. DELOS, described in this paper, is a Modern Greek corpus of economic domain consisting of 5 million word tokens, which is automatically tagged for morphology and shallow syntactic relations. The annotating tools described are embodied in an integrated system and their application to the corpus is performed using the GATE text engineering platform. The system output is a textual database marked up with the annotation tagset in plain text as well as in XML format.

Details

Paper ID
lrec2002-main-172
Pages
N/A
BibKey
kermanidis-etal-2002-delos
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • KK

    Katia Lida Kermanidis

  • NF

    Nikos Fakotakis

  • GK

    George Kokkinakis

Links