Back to Main Conference 2022
LREC 2022main

A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/337o5ju7qt5b

Abstract

Large datasets as required for deep learning of lip reading do not exist in many languages. In this paper we present the dataset GLips (German Lips) consisting of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language LRW (Lip Reading in the Wild) dataset, with each video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. By training a deep neural network, we investigate whether lip reading has language-independent features, so that datasets of different languages can be used to improve lip reading models. We demonstrate learning from scratch and show that transfer learning from LRW to GLips and vice versa improves learning speed and performance, in particular for the validation set.

Details

Paper ID
lrec2022-main-737
Pages
pp. 6829-6836
BibKey
schwiebert-etal-2022-multimodal
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • GS

    Gerald Schwiebert

  • CW

    Cornelius Weber

  • LQ

    Leyuan Qu

  • HS

    Henrique Siqueira

  • SW

    Stefan Wermter

Links