HomeLREC 2026WorkshopsKGLLMlrec2026-ws-kgllm-08
Back to KGLLM 2026
LREC 2026workshop

Ontology-Guided Synthetic Data Generation for Low-Resource Information Extraction: A Case Study in IT Heritage Domain

Proceedings of the Knowledge Graphs and Large Language Models Workshop (KG-LLM) @ LREC26

DOI:10.63317/2ze7afc3v7zy

Abstract

Information Extraction (IE) in specialized domains often suffers from a severe cold-start problem due to the high cost of expert annotation. Recent Reverse-IE approaches leverage knowledge graphs to generate synthetic training corpora, but typically assume the availability of an existing knowledge base. In this work, we propose an ontology-driven pipeline for synthetic supervision that removes this requirement. Starting from a formal domain ontology, we introduce a stochastic motif sampling strategy that constructs schema-consistent Knowledge Graph structures with controllable topology, which are then verbalized into natural language. This ontology-first formulation also allows direct control over the data generation process, enabling oversampling of underrepresented entity types or relation patterns. Applied to the IT Heritage domain, our approach produces a fully labeled NER/RE corpus without large-scale manual annotation. Evaluation in a low-resource setting shows that while the synthetic corpus lacks the linguistic diversity of gold data, its scalability produces training sets large enough to alleviate the cold-start problem, making ontology-guided motif generation a practical strategy for domains where gold annotation is limited.

Details

Paper ID
lrec2026-ws-kgllm-08
Pages
pp. 73-81
BibKey
vuth-etal-2026-ontology
Editors
Gilles Sérasset, Katerina Gkirtzou, Michael Cochez, Jan-Christoph Kalo
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Knowledge Graphs and Large Language Models Workshop (KG-LLM) @ LREC26
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • NV

    Nakanyseth Vuth

  • EP

    Emrick Poncet

  • GS

    Gilles Sérasset

  • DS

    Didier Schwab

  • CD

    Caroline Djambian Djambian

  • BL

    Benjamin Lecouteux

Links