Back to Main Conference 2026
LREC 2026main

An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took "Use of Practical AI in Digital Libraries" Seriously?

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5kag6gjg636f

Abstract

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers’ work.

Details

Paper ID
lrec2026-main-012
Pages
pp. 169-184
BibKey
dsouza-etal-2026-extreme
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • JD

    Jennifer D'Souza

  • SS

    Sameer Sadruddin

  • MK

    Maximilian Kaehler

  • AS

    Andrea Salfinger

  • LZ

    Luca Zaccagna

  • FI

    Francesca Incitti

  • LS

    Lauro Snidaro

  • OS

    Osma Suominen

Links