Back to Main Conference 2002
LREC 2002main

An Efficient and Flexible Format for Linguistic and Semantic Annotation

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/386nvb7rtshp

Abstract

The paper describes an XML annotation format and tool developed within the MUCHMORE project. The annotation scheme was designed specifically for the purposes of Cross-Lingual Information Retrieval in the medical domain so as to allow both efficient and flexible access to layers of information. We use a parallel English-German corpus of medical abstracts and annotate it with linguistic information (tokenisation, part-of-speech tagging, lemmatisation and decomposition, phrase recognition, grammatical functions) as well as semantic information from various sources. The annotation of medical terms/concepts, semantic types and semantic relations is based on the Unified Medical Language System (UMLS). Additionally, we use EuroWordNet as a general-language resource in annotating word senses and to compare domain-specific and general language use. A major aim of the project is also to complement existing ontological resources by extracting new terms and new semantic relations. We present the annotation scheme, which is conceptually related to stand-off annotation, and describe our tool for automatic semantic annotation.

Details

Paper ID
lrec2002-main-167
Pages
N/A
BibKey
vintar-etal-2002-efficient
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • ŠV

    Špela Vintar

  • PB

    Paul Buitelaar

  • BR

    Bärbel Ripplinger

  • BS

    Bogdan Sacaleanu

  • DR

    Diana Raileanu

  • DP

    Detlef Prescher

Links