Back to Main Conference 2012
LREC 2012main

Creation of a bottom-up corpus-based ontology for Italian Linguistics

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/5oqypsvt7yt4

Abstract

This paper describes the steps of construction of a shallow lexical ontology of Italian Linguistics, set to be used by a meta-search engine for query refinement. The ontology was constructed with the software Protégé 4.0.2 and is in OWL format; its construction has been carried out following the steps described in the well-known Ontology Learning From Text (OLFT) layer cake. The starting point was the automatic term extraction from a corpus of web documents concerning the domain of interest (304,000 words); as regards corpus construction, we describe the main criteria of the web documents selection and its critical points, concerning the definition of user profile and of degrees of specialisation. We describe then the process of term validation and construction of a glossary of terms of Italian Linguistics; afterwards, we outline the identification of synonymic chains and the main criteria of ontology design: top classes of ontology are Concept (containing taxonomy of concepts) and Terms (containing terms of the glossary as instances), while concepts are linked through part-whole and involved-role relation, both borrowed from Wordnet. Finally, we show some examples of the application of the ontology for query refinement.

Details

Paper ID
lrec2012-main-426
Pages
pp. 2641-2647
BibKey
bianchi-etal-2012-creation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • EB

    Elisa Bianchi

  • MT

    Mirko Tavosanis

  • EG

    Emiliano Giovannetti

Links