Back to Main Conference 2016
LREC 2016main

OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/489o4i342ujk

Abstract

We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.

Details

Paper ID
lrec2016-main-154
Pages
pp. 967-974
BibKey
reynaert-2016-ocr
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • MR

    Martin Reynaert

Links