Towards an Interoperable Corpus of Austrian Historical Newspapers: The case of PressMint-AT
Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers
Abstract
In this paper the PressMint-AT project is presented, which aims to create a historical newspaper corpus based on the Wiener Abendpost. The quality of automatic text recognition (ATR) is a key factor in creating historical newspaper corpora. Therefore, the performance of established ATR tools, multimodal large language models (LLMS), and existing full-text transcriptions provided by the Austrian National Library via ANNO is evaluated in order identify the most suitable approach for the PressMint-AT project. Even though recent research has demonstrated promising results for OCR tasks using multimodal LLMs, the experiments presented in this paper show, that PERO OCR achieves the best performance for the PressMint-AT dataset.