Back to Main Conference 2014
LREC 2014main

Bootstrapping Term Extractors for Multiple Languages

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/39xnmetugf4d

Abstract

Terminology extraction resources are needed for a wide range of human language technology applications, including knowledge management, information extraction, semantic search, cross-language information retrieval and automatic and assisted translation. We create a low cost method for creating terminology extraction resources for 21 non-English EU languages. Using parallel corpora and a projection method, we create a General POS Tagger for these languages. We also investigate the use of EuroVoc terms and Wikipedia corpus to automatically create term grammar for each language. Our results show that these automatically generated resources can assist term extraction process with similar performance to manually generated resources. All resources resulted in this experiment are freely available for download.

Details

Paper ID
lrec2014-main-364
Pages
pp. 483-489
BibKey
aker-etal-2014-bootstrapping
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • AA

    Ahmet Aker

  • MP

    Monica Paramita

  • EB

    Emma Barker

  • RG

    Robert Gaizauskas

Links