HomeLREC 2026WorkshopsPRESSMINTlrec2026-ws-pressmint-08
Back to PRESSMINT 2026
LREC 2026workshop

A Growing Literature of the Public Sphere: Fiction in Danish Newspapers (1666–1850)

Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers

DOI:10.63317/4ekyjogsy8zg

Abstract

Digitized literary corpora of the 19th century largely focus on standalone volumes, sidelining the broader and more diverse literary production of the period. Fiction published in less enduring formats – such as novellas and serialized pieces in newspapers – remains underexplored, particularly for low-resource languages like Danish, despite the growing availability of digitized newspaper archives. This paper addresses that gap by identifying and tagging fiction in Danish newspapers (1666–1850). We (1) present a manually annotated dataset of 1,831 articles with both binary (fiction/nonfiction) and fine-grained subcategories (travelogue, biography, essay), and (2) evaluate a document-embedding classifier that achieves an F1-score of up to 0.89 for the fiction/nonfiction distinction. Building on this pipeline, we further provide two resources for future research: (a) fiction probability scores for nearly five million newspaper articles (n=4,898,084), and (b) a small, cleaned, and curated subset of newspaper fiction (n=139), intended as a growing resource.

Details

Paper ID
lrec2026-ws-pressmint-08
Pages
pp. 40-49
BibKey
feldkamp-etal-2026-growing
Editors
Maciej Ogrodniczuk, Petya Osenova, Tanja Wissik
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Creating Interoperable Corpora of Historical Newspapers
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • PF

    Pascale Feldkamp

  • AL

    Alie Lassche

  • RE

    Rie Eriksen

  • KM

    Kit Morgenstjerne

  • KN

    Kristoffer Nielbo

  • JH

    Johan Heinsen

  • YB

    Yuri Bizzoni

Links