Back to Main Conference 2008
LREC 2008main

Generating a Morphological Lexicon of Organization Entity Names

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/26nndgqhb3p4

Abstract

This paper describes methods used for generating a morphological lexicon of organization entity names in Croatian. This resource is intended for two primary tasks: template-based natural language generation and named entity identification. The main problems concerning the lexicon generation are high level of inflection in Croatian and low linguistic quality of the primary resource containing named entities in normal form. The problem is divided into two subproblems concerning single-word and multi-word expressions. The single-word problem is solved by training a supervised learning algorithm called linear successive abstraction. With existing common language morphological resources and two simple hand-crafted rules backing up the algorithm, accuracy of 98.70% on the test set is achieved. The multi-word problem is solved through a semi-automated process for multi-word entities occurring in the first 10,000 named entities. The generated multi-word lexicon will be used for natural language generation only while named entity identification will be solved algorithmically in forthcoming research. The single-word lexicon is capable of handling both tasks.

Details

Paper ID
lrec2008-main-538
Pages
N/A
BibKey
ljubesic-etal-2008-generating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • NL

    Nikola Ljubešić

  • TL

    Tomislava Lauc

  • DB

    Damir Boras

Links