Bridging Text-to-Sign Translation via Codebook-Oriented Pretraining
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Sign Language Production (SLP), the automatic translation from spoken to sign languages, faces several challenges due to the intricate mapping between linguistic semantics and the spatial–temporal motion domain. Existing SLP methods employing a transformer model with a Vector Quantization (VQ) method exhibit poor translation performance due to weak semantic alignment between the codebook and the text representation. In this work, we propose a novel text-to-sign translation based on model pretraining, which enhances semantic alignment by inheriting codebook-oriented prior knowledge from masked self-supervised models. Our approach involves two stages: (i) transforming sign language into discrete values by employing VQ with masked self-attention learning to create pre-tasks that bridge the semantic gap between text and codebook representations, (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the model from the first stage. The integration of these designs forms a robust sign language representation and significantly improves the translation model, which surpass prior baselines.