HomeLREC 2026WorkshopsPRESSMINTlrec2026-ws-pressmint-03
Back to PRESSMINT 2026
LREC 2026workshop

PressMint QuickCheck: Operationalising Readiness Diagnostics for Interoperable Historical Newspaper Corpora

Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers

DOI:10.63317/2pgz3sxyv56b

Abstract

PressMint QuickCheck is a lightweight, reproducible readiness diagnostic for historical newspaper collections. Given a candidate dataset (ZIP export or IIIF manifests), it detects which components are present, identifies interoperability-critical metadata gaps, and applies lightweight OCR sanity checks. It produces three standardised artefacts: a human-readable readiness report, a minimal normalised manifest (CSV), and a tentative v1 scorecard (suitability_score 0-4) for prioritisation across collections. The workflow is delivered as a Colab-first notebook (no installation required). A key design decision treats content_language and metadata_language declarations as first-class interoperability signals, reflecting the multilingual scope of PressMint and ParlaMint corpora projects.

Details

Paper ID
lrec2026-ws-pressmint-03
Pages
pp. 11-15
BibKey
battanermoro-etal-2026-pressmint
Editors
Maciej Ogrodniczuk, Petya Osenova, Tanja Wissik
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • EB

    Elena Battaner Moro

  • AC

    Almudena Caballos Villar

  • MC

    María Cuevas Riaño

  • MM

    Marina Miguez Lamanuzzi

  • DR

    Dolores Romero López

Links