HomeLREC 2026WorkshopsPRESSMINTlrec2026-ws-pressmint-06
Back to PRESSMINT 2026
LREC 2026workshop

CLARIAH-ES PressMint: Building Interoperable Corpora of Historical Press in Spain

Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers

DOI:10.63317/5pzk95q2kup3

Abstract

This paper describes CLARIAH-ES’s contribution to PressMint in Spain as a distributed effort across regional nodes (e.g., Catalonia, Madrid, Basque Country, Galicia, Canary Islands, Alicante), each developing manageable corpora in partnership with key repositories such as ARCA, Patrimonio Digital Complutense, Euskariana, Jable, Galiciana, and the BVMC periodicals portal. A central technical challenge is heterogeneous legacy OCR quality, motivating experiments with AI/LLM-assisted OCR renewal, normalization layers, and linguistic enrichment (e.g., NER and entity linking). This effort is situated alongside ongoing dissemination and the EOSC Mesh "historical newspapers" use-case work aimed at scalable discovery, access, and federated computation over interoperable historical press data.

Details

Paper ID
lrec2026-ws-pressmint-06
Pages
pp. 27-33
BibKey
estarrona-etal-2026-clariah
Editors
Maciej Ogrodniczuk, Petya Osenova, Tanja Wissik
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • AE

    Ainara Estarrona

  • AF

    Aritz Farwell

  • GR

    German Rigau

  • XG

    Xabier Goenaga

Links