Compiling large language resources using lexical similarity metrics for domain taxonomy learning

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

Abstract

In this contribution we present a new methodology to compile large language resources for domain-specific taxonomy learning. We describe the necessary stages to deal with the rich morphology of an agglutinative language, i.e. Korean, and point out a second order machine learning algorithm to unveil term similarity from a given raw text corpus. The language resource compilation described is part of a fully automatic top-down approach to construct taxonomies, without involving the human efforts which are usually required.