Back to Main Conference 2026
LREC 2026main

A Novel Synthetic Dataset for Few-Shot Legal Relation Extraction in German

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5oaz3mdtekn9

Abstract

The legal domain is particularly challenging for natural language processing due to the personal and confidential information it contains. Despite the significant advances of large language models (LLMs), applying them to relation extraction (RE) in legal texts remains challenging, not only because of the task’s linguistic and semantic complexity, but also due to privacy, compliance, and infrastructure constraints under regulations such as the EU AI Act. To address these challenges, we propose a novel synthetic dataset for German legal relation extraction, created using LLMs through a controlled, privacy-preserving, template-based pipeline. The dataset allows for reproducible and legally compliant experimentation. We benchmark it using two few-shot learning paradigms, a description-enhanced Model-Agnostic Meta-Learning (MAML) framework and Prototypical Networks with supervised contrastive loss and curriculum-aware prototype enrichment. Our results demonstrate that combining few-shot learning with structured semantic knowledge achieves robust and interpretable results, with the curriculum-aware Proto-Contrastive model reaching an F1-score of 99.83%.

Details

Paper ID
lrec2026-main-830
Pages
pp. 10579-10591
BibKey
nouri-etal-2026-novel
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • SN

    Shiva Banasaz Nouri

  • EL

    Elena Leitner

  • JM

    Julian Moreno-Schneider

  • GR

    Georg Rehm

Links