Back to Main Conference 2000
LREC 2000main

Semantico-syntactic Tagging of Very Large Corpora: the Case of Restoration of Nodes on the Underlying Level

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/5ox4ngnn9r6a

Abstract

The Prague Dependency Treebank has been conceived of as a semi-automatic three-layer annotation system, in which the layers of morphemic and 'analytic' (surface-syntactic) tagging are followed by the layer of tectogrammatical tree structures. Two types of deletions are recognized: (i) those licensed by the grammatical properties of the given sentence, and (ii) those possible only if the preceding context exhibits certain specific properties. Within group (i), either the position itself in the sentence structure is determined, but its lexical setting is 'free' (as e.g. with a deleted subject in Czech as a pro-drop language), or both the position and its 'filler' are determined. Group (ii) reflects the typological differences between English and Czech; the rich morphemics of the latter is more favorable for deletions. Several steps of the tagging procedure are carried out automatically, but most parts of the restoration of deleted nodes still have to be done ''manually''. If along with the node that is being restored, also nodes depending on it are deleted, then these are restored only if they function as arguments or obligatory adjuncts. The large set of annotated utterances will make it possible to check and amend the present results, also with applications of statistic methods. Theoretical linguistics will be enabled to check its descriptive framework; the degree of automation of the procedure will then be raised, and the treebank will be useful for most different tasks in language processing.

Details

Paper ID
lrec2000-main-014
Pages
N/A
BibKey
hajicova-sgall-2000-semantico
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • EH

    Eva Hajičová

  • PS

    Petr Sgall

Links