Back to Main Conference 2010
LREC 2010main

A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3azxg2rmvfwm

Abstract

In this paper, we present a novel approach to multi-word terminology extraction combining a well-known automatic term recognition approach, the C--NC value method, with a contrastive ranking technique, aimed at refining obtained results either by filtering noise due to common words or by discerning between semantically different types of terms within heterogeneous terminologies. Differently from other contrastive methods proposed in the literature that focus on single terms to overcome the multi-word terms' sparsity problem, the proposed contrastive function is able to handle variation in low frequency events by directly operating on pre-selected multi-word terms. This methodology has been tested in two case studies carried out in the History of Art and Legal domains. Evaluation of achieved results showed that the proposed two--stage approach improves significantly multi--word term extraction results. In particular, for what concerns the legal domain it provides an answer to a well-known problem in the semi--automatic construction of legal ontologies, namely that of singling out law terms from terms of the specific domain being regulated.

Details

Paper ID
lrec2010-main-379
Pages
N/A
BibKey
bonin-etal-2010-contrastive
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • FB

    Francesca Bonin

  • FD

    Felice Dell’Orletta

  • SM

    Simonetta Montemagni

  • GV

    Giulia Venturi

Links