Back to Main Conference 2006
LREC 2006main

Perspectives of Turning Prague Dependency Treebank into a Knowledge Base

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/2xjpquih93ab

Abstract

Recently, the Prague Dependency Treebank 2.0 (PDT 2.0) has emerged as the largest text corpora annotated on the level of tectogrammatical representation (“linguistic meaning”) described in Sgall et al. (2004) and containing about 0.8 milion words (see Hajic (2004)). We hope that this level of annotation is so close to the meaning of the utterances contained in the corpora that it should enable us to automatically transform texts contained in the corpora to the form of knowledge base, usable for information extraction, question answering, summarization, etc. We can use Multilayered Extended Semantic Networks (MultiNet) described in Helbig (2006) as the target formalism. In this paper we discuss the suitability of such approach and some of the main issues that will arise in the process. In section 1, we introduce formalisms underlying PDT 2.0 and MultiNet, in section 2. We describe the role MultiNet can play in the system of Functional Generative Description (FGD), section 3 discusses issues of automatic conversion to MultiNet and section 4 gives some conclusions.

Details

Paper ID
lrec2006-main-137
Pages
N/A
BibKey
novak-hajic-2006-perspectives
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • VN

    Václav Novák

  • JH

    Jan Hajič

Links