Back to Main Conference 2026
LREC 2026main

National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language Contemporary Literature

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/59wsms6588ys

Abstract

This paper introduces DeLiKo-2025@DNB, a very large, linguistically annotated corpus of German-language contemporary literature, freely accessible via https://korap.dnb.de/. The corpus currently comprises 21 billion words from over 287,000 books published between 2005 and the present, spanning pulp and genre fiction as well as literary award-winning works. It covers the entire holdings of EPUB-format fiction ebooks deposited with the German National Library (DNB). We provide a detailed account of the corpus composition, metadata, and key features. Additionally, we explain our strategy for enabling lawful and effective access through the deployment of the open‑source corpus analysis platform KorAP at the DNB, and we discuss both the transferability of our approach and work to other national libraries and our ongoing and planned extensions and enhancements.

Details

Paper ID
lrec2026-main-518
Pages
pp. 6528-6535
BibKey
kupietz-etal-2026-national
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MK

    Marc Kupietz

  • ND

    Nils Diewald

  • PG

    Philippe Genêt

  • AW

    Andreas Witt

Links