Back to Main Conference 2006
LREC 2006main

Identifying Named Entities in Text Databases from the Natural History Domain

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4nu9bb98qqu9

Abstract

In this paper, we investigate whether it is possible to bootstrap a named entity tagger for textual databases by exploiting the database structure to automatically generate domain and database-specific gazetteer lists. We compare three tagging strategies: (i) using the extracted gazetteers in a look-up tagger, (ii) using the gazetteers to automatically extract training data to train a database-specific tagger, and (iii) using a generic named entity tagger. Our results suggest that automatically built gazetteers in combination with a look-up tagger lead to a relatively good performance and that generic taggers do not perform particularly well on this type of data.

Details

Paper ID
lrec2006-main-285
Pages
N/A
BibKey
sporleder-etal-2006-identifying
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • CS

    Caroline Sporleder

  • Mv

    Marieke van Erp

  • TP

    Tijn Porcelijn

  • Av

    Antal van den Bosch

  • PA

    Pim Arntzen

Links