Back to Main Conference 2026
LREC 2026main

How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5pmyrs7f9o33

Abstract

We present a new 4.1 hours long high-quality motion capture sign language dataset for Swedish Sign Language — STS Mocap v1. The dataset consists of high quality multimodal data: body tracked with markers, fingers tracked with Manus Quantum Metagloves, face tracked with iPhone LiveLink app in MetaHuman Animator mode, and corresponding textual sentence translation to spoken Swedish. With the help of this dataset, we show that four hours of motion capture data is enough for generative modeling of sign language conditioned on 2D pose. In comparison, training the same flow-matching model on only 30 minutes of this data, which is a common size for sign language motion capture datasets, shows a significant degradation in the quality of the synthesized data.

Details

Paper ID
lrec2026-main-750
Pages
pp. 9549-9558
BibKey
klezovich-etal-2026-how
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AK

    Anna Klezovich

  • JM

    Johanna Mesch

  • GH

    Gustav Eje Henter

  • JB

    Jonas Beskow

Links