Perceptual Validation of 3D Pose, Guided Sign Language Synthesis
Proceedings of the LREC 2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Abstract
Sign language corpora face a structural tension between open-access requirements and the irreducible biometric identity embedded in visual, gestural data. While 3D pose estimation enables signer-agnostic abstraction, the representational adequacy of pose-based modeling for preserving linguistic structure remains underexplored. This paper introduces a perceptually-grounded kinematic modeling framework that formalizes 3D landmark sequences as an intermediate linguistic representation and validates their adequacy through avatar-mediated synthesis and large-scale human evaluation. Using 30370 gloss-level Kenyan Sign Language (KSL) segments derived from the AI4KSL corpus, we construct normalized 3D motion trajectories via MediaPipe Holistic. These trajectories are retargeted to parameterized avatars through a constrained kinematic mapping that preserves non-manual marker geometry and articulatory timing. We define a dual evaluation paradigm combining geometric fidelity metrics (PCK=92.7%, OKS=0.88, PCP=91.5%, PDJ>85.3%) with perceptual constructs measured across a statistically powered Deaf participant cohort (N=384). Results demonstrate a strong predictive relationship between structural joint precision and perceived gesture clarity (r=0.76, p<.01), suggesting that linguistic adequacy is partially recoverable from normalized kinematic structure. Furthermore, representational diversity in avatar instantiation significantly increases perceived inclusivity without degrading intelligibility. These findings establish pose-based motion abstraction not merely as an anonymization technique but as a viable corpus-level modeling layer for ethically sustainable language in motion.