Back to Main Conference 2008
LREC 2008main

L-ISA: Learning Domain Specific Isa-Relations from the Web

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/2j6tyy8ozjmn

Abstract

Automated extraction of ontological knowledge from text corpora is a relevant task in Natural Language Processing. In this paper, we focus on the problem of finding hypernyms for relevant concepts in a specific domain (e.g. Optical Recording) in the context of a concrete and challenging application scenario (patent processing). To this end information available on the Web is exploited. The extraction method includes four mains steps. Firstly, the Google search engine is exploited to retrieve possible instances of isa-patterns reported in the literature. Then, the returned snippets are filtered on the basis of lexico-syntactic criteria (e.g. the candidate hypernym must be expressed as a noun phrase without complex modifiers). In a further filtering step, only candidate hypernyms compatible with the target domain are kept. Finally a candidate ranking mechanism is applied to select one hypernym as output of the algorithm. The extraction method was evaluated on 100 concepts of the Optical Recording domain. Moreover, the reliability of isa-patterns reported in the literature as predictors of isa-relations was assessed by manually evaluating the template instances remaining after lexico-syntactic filtering, for 3 concepts of the same domain. While more extensive testing is needed the method appears promising especially for its portability across different domains.

Details

Paper ID
lrec2008-main-180
Pages
N/A
BibKey
potrich-pianta-2008-l
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • AP

    Alessandra Potrich

  • EP

    Emanuele Pianta

Links