Back to Main Conference 2010
LREC 2010main

A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/3phrkmees7nn

Abstract

This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn_score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn_score takes one of five values, {+2, +1, 0, -1, -2}, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn_score can be easily extended to bilingual pn_score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.

Details

Paper ID
lrec2010-main-235
Pages
N/A
BibKey
sato-kaide-2010-person
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • SS

    Satoshi Sato

  • SK

    Sayoko Kaide

Links