Back to Main Conference 2010
LREC 2010main

A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3gjwb3a289x2

Abstract

Most of the research on the extraction of idiomatic multiword expressions (MWEs) focused on the acquisition of MWE types. In the present work we investigate whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not. Inspired by the dataset provided by (Cook et al., 2008), we manually analysed 9,700 instances of potentially idiomatic prepositionnoun- verb triples (a frequent pattern among German MWEs) to identify, on token level, idiomatic vs. literal uses. In our dataset, all sentences are provided along with their morpho-syntactic properties. We describe our data extraction and annotation steps, and we discuss quantitative results from both EUROPARL and a German newspaper corpus. We discuss the relationship between idiomaticity and morpho-syntactic fixedness, and we address issues of ambiguity between literal and idiomatic use of MWEs. Our data show that EUROPARL is particularly well suited for MWE extraction, as most MWEs in this corpus are indeed used only idiomatically.

Details

Paper ID
lrec2010-main-504
Pages
N/A
BibKey
fritzinger-etal-2010-survey
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • FF

    Fabienne Fritzinger

  • MW

    Marion Weller

  • UH

    Ulrich Heid

Links