Back to Main Conference 2012
LREC 2012main

Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/4ad43vah7ajs

Abstract

Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed from French newspapers of December 1890. We present the methodology we used to produce the corpus and the characteristics of the corpus in terms of named entities annotation. This annotated corpus has been used in an evaluation campaign. We present this evaluation, the metrics we used and the results obtained by the participants.

Details

Paper ID
lrec2012-main-166
Pages
pp. 3126-3131
BibKey
galibert-etal-2012-extended
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • OG

    Olivier Galibert

  • SR

    Sophie Rosset

  • CG

    Cyril Grouin

  • PZ

    Pierre Zweigenbaum

  • LQ

    Ludovic Quintard

Links