Summary of the paper

Title A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora
Authors Francesca Bonin, Felice Dell'Orletta, Simonetta Montemagni and Giulia Venturi
Abstract In this paper, we present a novel approach to multi-word terminology extraction combining a well-known automatic term recognition approach, the C--NC value method, with a contrastive ranking technique, aimed at refining obtained results either by filtering noise due to common words or by discerning between semantically different types of terms within heterogeneous terminologies. Differently from other contrastive methods proposed in the literature that focus on single terms to overcome the multi-word terms' sparsity problem, the proposed contrastive function is able to handle variation in low frequency events by directly operating on pre-selected multi-word terms. This methodology has been tested in two case studies carried out in the History of Art and Legal domains. Evaluation of achieved results showed that the proposed two--stage approach improves significantly multi--word term extraction results. In particular, for what concerns the legal domain it provides an answer to a well-known problem in the semi--automatic construction of legal ontologies, namely that of singling out law terms from terms of the specific domain being regulated.
Topics Ontologies, Tools, systems, applications, Multilinguality
Full paper A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora
Slides A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora
Bibtex @InProceedings{BONIN10.553,
  author = {Francesca Bonin and Felice Dell'Orletta and Simonetta Montemagni and Giulia Venturi},
  title = {A Contrastive Approach to Multi-word Extraction from Domain-specific Corpora},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA