HomeLREC 2026WorkshopsPRESSMINTlrec2026-ws-pressmint-07
Back to PRESSMINT 2026
LREC 2026workshop

Towards an Interoperable Corpus of Austrian Historical Newspapers: The case of PressMint-AT

Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers

DOI:10.63317/4ionpc6w2rmt

Abstract

In this paper the PressMint-AT project is presented, which aims to create a historical newspaper corpus based on the Wiener Abendpost. The quality of automatic text recognition (ATR) is a key factor in creating historical newspaper corpora. Therefore, the performance of established ATR tools, multimodal large language models (LLMS), and existing full-text transcriptions provided by the Austrian National Library via ANNO is evaluated in order identify the most suitable approach for the PressMint-AT project. Even though recent research has demonstrated promising results for OCR tasks using multimodal LLMs, the experiments presented in this paper show, that PERO OCR achieves the best performance for the PressMint-AT dataset.

Details

Paper ID
lrec2026-ws-pressmint-07
Pages
pp. 34-39
BibKey
wissik-etal-2026-interoperable
Editors
Maciej Ogrodniczuk, Petya Osenova, Tanja Wissik
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • TW

    Tanja Wissik

  • JH

    Jona Hassenbach

  • HP

    Hannes Pirker

  • CR

    Claudia Resch

  • SR

    Stefan Resch

Links