National Library as Corpus: DeLiKo-2025@DNB – a Very Large Corpus of German-language Contemporary Literature
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper introduces DeLiKo-2025@DNB, a very large, linguistically annotated corpus of German-language contemporary literature, freely accessible via https://korap.dnb.de/. The corpus currently comprises 21 billion words from over 287,000 books published between 2005 and the present, spanning pulp and genre fiction as well as literary award-winning works. It covers the entire holdings of EPUB-format fiction ebooks deposited with the German National Library (DNB). We provide a detailed account of the corpus composition, metadata, and key features. Additionally, we explain our strategy for enabling lawful and effective access through the deployment of the open‑source corpus analysis platform KorAP at the DNB, and we discuss both the transferability of our approach and work to other national libraries and our ongoing and planned extensions and enhancements.