Capturing Methodology for Generating Synthetic and 3D Training Data in Catalan Sign Language (LSC): The Case of Verbal Agreement
Proceedings of the LREC 2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Abstract
This paper proposes a hybrid methodology to generate high-quality synthetic data. Unlike other approaches based purely on generative Artificial Intelligence, which may suffer from hallucinations or inconsistent movements, this project uses 3D biomechanics and kinematics algorithms that enforce the anatomical constraints of the human body to ensure physically plausible movements. The aim of this research is to demonstrate that it is possible to synthetically expand the dataset. In particular, this paper focuses on verb agreement, a grammatical domain which is known for its morphological and articulatory complexity. By concentrating on the possible configurations of the movements in signing space when expressing different person agreeing verbal forms, we aim to capture real movements to extract physical parameters and apply them as logical rules —similar to those of a video game engine— to automatically synthesize thousands of new conjugations from infinitives with complete anatomical precision. Beyond spatial conjugation, the methodology further augments data through procedural variation of prosody and body morphology.