A Resource and Evaluation Method for Phonological Continuity in Japanese Sign Language
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Computational models for sign language processing often represent phonological components as categories. This approach, however, does not adequately capture the continuous nature of sign articulation, obscuring nuanced phonetic variation. Furthermore, the field has lacked resources and standardized methods to evaluate a model’s ability to represent this continuity. In this work, we address these limitations. First, we introduce the JSL Ordered Triplet Dataset, a new manually-annotated resource designed to benchmark the modeling of gradual phonological progressions in Japanese Sign Language. Second, we propose a learning framework that reframes the task from classification to ranking, using Positive-Unlabeled (PU) learning to optimize the Area Under the ROC Curve (AUC). Our intrinsic evaluation on the new dataset shows that the learned continuous embeddings significantly outperform a cross-entropy baseline in ordering intermediate forms, improving the average accuracy on the continuity ranking task across phonological components from 81.52% to 91.71%. These embeddings also maintain strong discriminative power for standard component classification. This work provides the community with a valuable resource and a method for learning and evaluating more linguistically-grounded representations of sign language.