Terminology-Augmented Generation for Intangible Cultural Heritage: A Controlled LLM-Based Translation Framework
Proceedings of the 2nd Workshop on Evaluating Text Difficulty in a Multilingual Context (DeTermIt! 2026)
Abstract
This study examines the integration of a bilingual Italian–Spanish concept-oriented terminological resource into a controlled large language model (LLM) translation workflow within the domain of Campanian gastronomy. The termbase encodes structured conceptual, linguistic, and translational metadata, including grammatical information, translation strategies, and genre-sensitive usage recommendations. Through a local Model Context Protocol (MCP) architecture, the resource is dynamically connected to locally deployed LLMs, enabling the automatic identification and retrieval of relevant terminological units prior to generation. The system combines in-context terminological injection with deterministic post-processing enforcement: genre-specific policies are injected into the model prompt prior to generation and verified through a rule-based post-processing layer that enforces surface-level terminological consistency in the output. Two open-weight models — Mistral 7B Instruct and Gemma3 4B — are evaluated across three conditions and three discursive genres on a dataset of authentic texts. The findings suggest that the combination of terminological injection and deterministic enforcement can improve terminological compliance in controlled, domain-specific settings, while also highlighting differences in instruction-following behavior across models and genres.