HomeLREC 2026WorkshopsLT4HALAlrec2026-ws-lt4hala-49
Back to LT4HALA 2026
LREC 2026workshop

OldBERTur: Named Entity Recognition for Medieval Icelandic

Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026

DOI:10.63317/36mey5zik2id

Abstract

We present OldBERTur, a Named Entity Recognition (NER) model for Old Icelandic available in two variations, one for normalised texts, and one for diplomatic texts. Using a BERT-based model architecture, we fine-tune an existing BERT language model, and due to training data scarcity, we employ multiple training configurations, including pre-training domain adaptation, sentence-level data resampling, and modern Icelandic data augmentation; achieving a 93 F1 score for normalised texts, and 79 for diplomatic texts. We find that additional training configurations, such as resampling entity-annotated Old Icelandic texts, significantly improve performance in low-resource settings, while the effectiveness of added training configurations diminishes as the available training data increases. Our models can be used to automatically identify and classify person and location names in texts sourced from the rich Icelandic medieval literary tradition. Our models, along with their data and code, are made publicly available to allow for reuse and future research into medieval Scandinavian NLP and beyond.

Details

Paper ID
lrec2026-ws-lt4hala-49
Pages
pp. 469-481
BibKey
henningsson-etal-2026-oldbertur
Editors
Rachele Sprugnoli, Marco Passarotti
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PH

    Pontus Henningsson

  • EP

    Eva Pettersson

  • EL

    Erik Lenas

Links