Back to Main Conference 2006
LREC 2006main

A Study on Terminology Extraction Based on Classified Corpora

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/3jaadfo8accv

Abstract

Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood (Kageura, 1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In classified corpora, the class information provides the cue to the nature of data and can be used in termhood calculation. Three algorithms are provided and evaluated to investigate termhood based on classified corpora. The three algorithms are based on lexicon set computing, term frequency and document frequency, and the strength of the relation between a term and its document class respectively. Our objective is to investigate the effects of these different termhood measurement features. After evaluation, we can find which features are more effective and also, how we can improve these different features to achieve the best performance. Preliminary results show that the first measure can effectively filter out independent terms or terms of general use.

Details

Paper ID
lrec2006-main-236
Pages
N/A
BibKey
chen-etal-2006-study
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • YC

    Yirong Chen

  • QL

    Qin Lu

  • WL

    Wenjie Li

  • ZS

    Zhifang Sui

  • LJ

    Luning Ji

Links