Back to Main Conference 2000
LREC 2000main

Towards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/2ejc82yqn5wv

Abstract

We describe our attempts to formulate a pragmatic definition and a partial typology of the lexical category of ’collocation’ taking both lexicographical and computational aspects into consideration. This provides a suitable basis for encoding collocations in an NLP-lexicon. Further, this paper explains the principles of an operational encoding strategy which is applied to a core section of the typology, namely to subtypes of verbal collocation. This strategy is adapted to a pre-defined lexicon model which has been developed in the PAROLE-project. The work is carried out within the framework of the STO-project the aim of which is to extend the Danish PAROLE-lexicon. The encoding of collocations, in addition to single-word lemmas, greatly increases the lexical and linguistic coverage and thereby also the usability of the lexicon as a whole. Decisions concerning the selection of the most frequent types of collocation to be encoded are made on empirical data i.e. corpus-based recognition. We present linguistic descriptions with focus on some characteristic syntactic features of collocations that are observed in a newspaper corpus. We then give a few prototypical examples provided with formalised descriptions in order to illustrate the restriction features. Finally, we discuss the perspectives of the work done so far.

Details

Paper ID
lrec2000-main-035
Pages
N/A
BibKey
braasch-olsen-2000-towards
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • AB

    Anna Braasch

  • SO

    Sussi Olsen

Links