Ithaca Revisited: Benchmarking a Domain-Specific Model for Epigraphy in the Age of LLMs

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

Abstract

The restoration and interpretation of fragmentary inscriptions remain central challenges in epigraphy, where scholars must reconstruct missing text and determine an inscription’s provenance and chronology from limited evidence. Ithaca, a neural model introduced in 2022, represented a landmark advance in this field, achieving highly accurate results in text restoration and spatio-temporal attribution. Since then, general-purpose large language models (LLMs) such as GPT, Claude, and Gemini have achieved remarkable versatility across many domains, raising the question of whether specialized architectures like Ithaca are still required. In this paper, we revisit Ithaca with a dual focus. First, we benchmark its performance against GPT-5, finding that Ithaca continues to substantially outperform a state-of-the-art general-purpose LLM used in a retrieval-augmented in-context learning setting. Second, we conduct a systematic analysis to characterize Ithaca’s behavior under varying conditions, including lacuna size and position, inscription origin, and semantic topic. Statistical analyses highlight its systematic strengths and weaknesses. Taken together, our results map Ithaca’s performance profile, enabling more informed use in research and teaching.