Comparison of Low Bitrate Quantizers for Encoding Swedish Sign Language

Proceedings of the LREC 2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion

Abstract

This paper investigates the bitrate–distortion trade-off of different discrete representations for Swedish Sign Language (STS) using the STS Mocap v1 motion capture dataset. We compare the K-Means algorithm with the Residual Vector Quantized Variational Autoencoder (RQ-VAE) to determine how efficiently each method preserves salient motion information at low bitrates. The results show that RQ-VAE consistently achieves lower reconstruction error than K-Means at matching bitrates, particularly for body motion, and better preserves the signing space volume. We further demonstrate that quantized representations can serve as conditioning for a flow-matching generative model, producing plausible but still imperfect sign sequences at low bitrates. These findings highlight the advantages of vector quantized models for efficient sign language motion encoding.