Comparing LLM-Based Knowledge Graph Extraction Approaches on Literary Studies in Spanish: A Case Study on Orbis Tertius
Proceedings of Natural Scientific Language Processing (NSLP) @ LREC 2026
Abstract
Knowledge graph construction from scholarly text increasingly relies on large language models, yet different extraction architectures produce different graphs. Literary studies poses particular challenges: meaning is interpretive rather than factual, and the boundaries of relevant knowledge are determined by hermeneutic frameworks rather than empirical verification. We compare two LLM-based extraction frameworks—entity-anchored extraction (KGGen) and open extraction with schema canonicalization (EDC)—on 472 Spanish-language literary studies articles from Orbis Tertius (1996–2024). Despite fundamental architectural differences, both methods converge on key findings: cultural framing dominates literary discourse by 2.2–2.5× over textual framing (p < .001), and core author networks remain consistent across approaches. The methods diverge in entity composition: KGGen captures more proper names (40.7% vs. 18.7%), while EDC captures more abstract concepts (42.8%) and preserves Spanish predicates with 21,025 semantic definitions. Convergent findings across architecturally different methods merit higher confidence, and we identify methodological considerations for knowledge graph construction from humanities scholarship.