Back to Main Conference 2000
LREC 2000main

The Concede Model for Lexical Databases

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/2iysctdturr2

Abstract

The value of language resources is greatly enhanced if they share a common markup with an explicit minimal semantics. Achieving this goal for lexical databases is difficult, as large-scale resources can realistically only be obtained by up-translation from pre-existing dictionaries, each with its own proprietary structure. This paper describes the approach we have taken in the Concede project, which aims to develop compatible lexical databases for six Central and Eastern European languages. Starting with sample entries from original presentation-oriented electronic representations of dictionaries, we transformed the data into an intermediate TEI-compatible represen-tation to provide a common baseline for evaluating and comparing the dictionaries. We then developed a more restrictive encoding, formalised as an XML DTD with a clearly-defined semantic interpretation. We present this DTD and discuss a sample conversion from TEI, together with an application which hyperlinks a HTML representation of the dictionary to on-line concordancing over a corpus.

Details

Paper ID
lrec2000-main-249
Pages
N/A
BibKey
erjavec-etal-2000-concede
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • TE

    Tomaž Erjavec

  • RE

    Roger Evans

  • NI

    Nancy Ide

  • AK

    Adam Kilgarriff

Links