Back to Main Conference 2016
LREC 2016main
OCR Post-Correction Evaluation of Early Dutch Books Online - Revisited
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Abstract
We present further work on evaluation of the fully automatic post-correction of Early Dutch Books Online, a collection of 10,333 18th century books. In prior work we evaluated the new implementation of Text-Induced Corpus Clean-up (TICCL) on the basis of a single book Gold Standard derived from this collection. In the current paper we revisit the same collection on the basis of a sizeable 1020 item random sample of OCR post-corrected strings from the full collection. Both evaluations have their own stories to tell and lessons to teach.