An Automatic Method for Constructing Domain-Specific Ontology Resources

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

Abstract

Data flow across multiple independent applications and further natural language analysis both require the establishment of a common foundation of terms and relations. Such a foundation can provide in-depth understanding of term equivalence within a domain sublanguage, and serve as a model of concept relations and dependencies. In this paper we discuss a domain-independent, corpus-based method for dictionary-less automatic extraction of ontological knowledge from domain-specific unannotated documents. We present the architecture, algorithms, and results for OntoStruct - a system that uses machine learning and statistical techniques to analyze text sources, discover terms, link equivalent terms into concepts, and learn both hierarchical and non-hierarchical conceptual relations. We report on OntoStruct's results in constructing domain-specific ontological resources and empirical evaluation of their quality.