Back to Main Conference 2026
LREC 2026main

MELD: Melding Diverse Multilingual and Multi-Domain Datasets for Named Entity Recognition Evaluation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/32qrd24xac2e

Abstract

Zero-shot Named Entity Recognition (NER) has gained prominence for information extraction across diverse domains without being limited to a single, fixed tag set. However, existing NER resources vary widely in data format, licensing terms, annotation schemes, and availability, making it difficult to systematically evaluate the generalization capabilities of zero-shot NER models. Prior attempts to aggregate datasets with broad coverage across domains have largely focused on a small subset of languages, and it is often not transparent how datasets were processed from their sources. This paper introduces MELD, a comprehensive multilingual and multi-domain data collection designed to address these gaps. MELD integrates 60 NER datasets spanning 194 languages, 14 domains, and 601 normalized entity types. While previously introduced multilingual NER datasets are mainly silver-standard, MELD contains gold-standard annotations for 60 languages. All data processing steps are fully open-source and reproducible, facilitating future extensions and ensuring long-term accessibility. While MELD is primarily designed for zero-shot evaluation, it also provides training and development splits in a single, consistent format to support future research in few-shot and supervised NER settings.

Details

Paper ID
lrec2026-main-148
Pages
pp. 1889-1903
BibKey
glocker-etal-2026-meld
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • KG

    Kevin Glocker

  • MK

    Marco Kuhlmann

Links