Back to Main Conference 2012
LREC 2012main

A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/3r6a3rqfo5ij

Abstract

In order to extract meaningful phrases from corpora (e. g. in an information retrieval context) intensive knowledge of the domain in question and the respective documents is generally needed. When moving to a new domain or language the underlying knowledge bases and models need to be adapted, which is often time-consuming and labor-intensive. This paper adresses the described challenge of phrase extraction from documents in different domains and languages and proposes an approach, which does not use comprehensive lexica and therefore can be easily transferred to new domains and languages. The effectiveness of the proposed approach is evaluated on user generated content and documents from the patent domain in English and German.

Details

Paper ID
lrec2012-main-249
Pages
pp. 538-543
BibKey
schulz-etal-2012-resource
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • JS

    Julia Maria Schulz

  • DB

    Daniela Becks

  • CW

    Christa Womser-Hacker

  • TM

    Thomas Mandl

Links