Back to Main Conference 2026
LREC 2026main

Relation Extraction across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4ddnnhq4dpfn

Abstract

When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets cannot be used to evaluate this question, since the knowledge graphs they correspond to, constructed by mapping entities in the text to nodes and relations to edges, are typically highly disconnected, too small, or overly complex. To address this gap, we introduce AFFILKG, which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture MEMBER relationships between PERSON and ORGANIZATION entities—useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AFFILKG’s ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.

Details

Paper ID
lrec2026-main-615
Pages
pp. 7744-7754
BibKey
cai-etal-2026-relation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • EC

    Erica Cai

  • SM

    Sean Mcquade

  • KY

    Kevin Young

  • BO

    Brendan O'Connor

Links