Back to Main Conference 2006
LREC 2006main

What in the world is a Shahab?: Wide Coverage Named Entity Recognition for Arabic

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4wemjmnwuzzu

Abstract

This paper describes the development of CiceroArabic, the first wide coverage named entity recognition (NER) system for Modern Standard Arabic. Capable of classifying 18 different named entity classes with over 85% F, CiceroArabic utilizes a new 800,000-word annotated Arabic newswire corpus in order to achieve high performance without the need for hand-crafted rules or morphological information. In addition to describing results from our system, we show that accurate named entity annotation for a large number of semantic classes is feasible, even for very large corpora, and we discuss new techniques designed to boost agreement and consistency among annotators over a long-term annotation effort.

Details

Paper ID
lrec2006-main-217
Pages
N/A
BibKey
nezda-etal-2006-world
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • LN

    Luke Nezda

  • AH

    Andrew Hickl

  • JL

    John Lehmann

  • SF

    Sarmad Fayyaz

Links