Linguistic Knowledge Graphs for Sense Prediction: A Case-study on Latin
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
This paper investigates the integration of the Linguistic Knowledge Graph (LKG) and Large Language Models (LLMs) for word sense prediction in Latin, a morphologically rich and low-resource historical language. Building on recent work in word sense disambiguation (WSD) and semantic change detection, we use a LKG that integrates information from a diachronic Latin corpus, a sense-annotated dataset of Latin, Latin WordNet, and Wikidata, as a structured representation of semantic and contextual relations. We present sense prediction as a binary classification task over the Latin dataset, using a Graph Retrieval-Augmented Generation approach that combines knowledge graph retrieval with LLM prompting. Two types of graph metadata are tested: author-related information (work, period, occupation) and linguistic metadata (synset and hypernyms derived from WordNet for each word sense). Experiments conducted on GPT-4o-mini, LLaMA-3.1-8B and LLaMA-3.3-70B show varying performance, with F1 scores ranging from 0.53 to 0.77. While GPT-4o-mini achieves the best overall accuracy, LLaMA-3.3-70B benefits the most from graph-based metadata, improving its F1 score by up to 3 points. Analysis by word type reveals that concrete and semantically shifting words are more easily disambiguated than abstract and semantically stable words. Results highlight both the promise and the challenges of combining graph-structured linguistic knowledge with LLMs for historical WSD.