Towards a Linguistic Linked Open Data Resource for Italian Cultural Heritage: The Lessico Dei Beni Culturali Corpus
Proceedings of 10th Workshop on Linked Data in Linguistics (LDL-2026)
Abstract
We present an ongoing effort to bridge the Lessico dei Beni Culturali (LBC), a multilingual lexicographic project cov- ering Italian cultural heritage terminology, with the Linguistic Linked Open Data (LLOD) ecosystem. The LBC corpus spans five centuries of art-historical writing, from fifteenth- and sixteenth-century treatises by Alberti, Leonardo, and Vasari to nineteenth-century works by Stendhal and Burckhardt and contemporary tourist guides to Florence, with source texts in several European languages alongside their translations. The resource has already undergone automatic linguistic annotation and term extraction, but lacks structured lexical representation in any standard LLOD formalism. We describe the current state of the resource, identify the main challenges for its publication as Linked Data — including the modelling of culturally-bound terms (realia), historical proper nouns, and multilingual source texts of different registers — and outline a roadmap towards its representation in OntoLex-Lemon (McCrae et al., 2017) and its alignment with existing LLOD resources such as the Getty Vocabularies (Getty Research Institute, 2024a) and Wikidata (Vrandecic and Krötzsch, 2014). By sharing this work with the LLOD community, we expect input on best practices for historical-artistic and cultural heritage lexicons that will raise interoperability between resources from different sources, generating new information and increasing the value of existing data.