HomeLREC 2026WorkshopsRESOURCEFULlrec2026-ws-resourceful-18
Back to RESOURCEFUL 2026
LREC 2026workshop

Exploring the similarities and differences between VLM-driven and traditional OCR for Historical Swedish Data

The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)

DOI:10.63317/4zr3kytoswtq

Abstract

Recent Swedish OCR efforts rely primarily on traditional OCR methods, including deep CNN–LSTM hybrid neural networks and transformer-based models. Some approaches have also demonstrated the applicability of VLM-driven OCR to historical material. However, to date, no studies have examined in depth the performance of VLM-based OCR on historical Swedish sources. In this paper, we ask: How do transformers and VLMs differ in character- and word-level recognition performance across typefaces, and what qualitative differences can be observed in their error patterns? We show that fine-tuned versions of the Alibaba Cloud Qwen3-VL-8B-Instruct and Qwen3-VL-2B-Instruct, combined with a simple repetition-trimming step, outperform conventional OCR systems. Remaining errors are primarily attributable to challenges associated with the Blackletter typeface and formatting issues, such as missing or extra line breaks, characters, and spaces. Even when characters are correctly recognized, formatting inconsistencies can substantially increase transcription error rates.

Details

Paper ID
lrec2026-ws-resourceful-18
Pages
pp. 193-199
BibKey
johansson-etal-2026-exploring
Editors
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
The Fourth Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2026)
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • MJ

    Martin Johansson

  • SW

    Selma Waginder

  • DD

    Dana Dannélls

Links