An Annotation Formalism for a French–LSF Bilingual Corpus Supporting Sign Language Generation
Proceedings of the LREC 2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion
Abstract
This paper introduces an annotation formalism for bilingual corpora of written French and French Sign Language (LSF), based on a manually-produced, expert transcription of LSF video data. The formalism captures the grammatical specificities of LSF, including spatial and iconic mechanisms, while explicitly encoding features that support motor programs for animated signing avatars. We propose a parameterized gloss-based approach, called PGloss-LSF, which integrates syntactic and semantic structures alongside motion features critical for accurate sign synthesis. We illustrate the framework with examples drawn from our bilingual corpus. The annotation process is incremental, ensuring internal consistency and computational tractability through a two-step evaluation: a qualitative assessment aligning generated signs with the annotation language, and a quantitative evaluation via automatic translation using large language models. By bridging the linguistic specificities of sign language with the computational requirements of sign synthesis, this work advances the integration of sign language corpora into multilingual resources and contributes to the standardization of sign language technologies.