Back to Main Conference 2000
LREC 2000main

Automatically Augmenting Terminological Lexicons from Untagged Text

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/3bw4pz8ohtbb

Abstract

Lexical resources play a crucial role in language technology but lexical acquisition can often be a time-consuming, laborious and costly exercise. In this paper, we describe a method for the automatic acquisition of technical terminology from domain restricted texts without the need for sophisticated natural language processing tools, such as taggers or parsers, or text corpora annotated with labelled cases. The method is based on the idea of using prior or seed knowledge in order to discover co-occurrence patterns for the terms in the texts. A bootstrapping algorithm has been developed that identifies patterns and new terms in an iterative manner. Experiments with scientific journal abstracts in the biology domain indicate an accuracy rate for the extracted terms ranging from 58% to 71%. The new terms have been found useful for improving the coverage of a system used for terminology identification tasks in the biology domain.

Details

Paper ID
lrec2000-main-240
Pages
N/A
BibKey
demetriou-gaizauskas-2000-automatically
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • GD

    George Demetriou

  • RG

    Robert Gaizauskas

Links