Back to Main Conference 2026
LREC 2026main

Evaluation of Co-Speech Gesture Tracking Techniques in Naturalistic Interactions

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2k233c5pnfsi

Abstract

Hand gestures convey a significant portion of communicative meaning, making multimodal datasets essential for interaction research. However, annotating gestures remains a time-consuming and challenging task. To speed up the process, semi-automatic methods have been developed that identify segments with hand movement for annotators to refine. These typically combine a pose estimation model with a rule-based or statistical movement detection algorithm. However, most are validated on idealised, non-naturalistic datasets with minimal hand occlusions. We benchmark combinations of four pose estimation methods (OpenPose, MediaPipe, DeepLabCut, and Kinect) and two rule-based movement detection algorithms on two naturalistic, conversational datasets. The best pipelines combine the SPUDNIG displacement algorithm with OpenPose on MULTISIMO and with DeepLabCut on ECOLANG. These pipelines achieved Tversky scores of 0.57 on MULTISIMO and 0.65 on ECOLANG, with recall scores of 0.73 and 0.78, respectively. While off-the-shelf gesture detection systems can support annotation, performance remains limited on naturalistic data, and careful camera setup minimizing occlusions is essential.

Details

Paper ID
lrec2026-main-497
Pages
pp. 6277-6288
BibKey
ivanova-etal-2026-evaluation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • VI

    Victoria Ivanova

  • NH

    Naomi Harte

Links