Back to Main Conference 2014
LREC 2014main

Clustering of Multi-Word Named Entity variants: Multilingual Evaluation

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/329pw6zqodk6

Abstract

Multi-word entities, such as organisation names, are frequently written in many different ways. We have previously automatically identified over one million acronym pairs in 22 languages, consisting of their short form (e.g. EC) and their corresponding long forms (e.g. European Commission, European Union Commission). In order to automatically group such long form variants as belonging to the same entity, we cluster them, using bottom-up hierarchical clustering and pair-wise string similarity metrics. In this paper, we address the issue of how to evaluate the named entity variant clusters automatically, with minimal human annotation effort. We present experiments that make use of Wikipedia redirection tables and we show that this method produces good results.

Details

Paper ID
lrec2014-main-396
Pages
pp. 2548-2553
BibKey
jacquet-etal-2014-clustering
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • GJ

    Guillaume Jacquet

  • ME

    Maud Ehrmann

  • RS

    Ralf Steinberger

Links