BEA-Base: A Benchmark for ASR of Spontaneous Hungarian

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

Abstract

Hungarian is spoken by 15 million people, still, easily accessible Automatic Speech Recognition (ASR) benchmark datasets – especially for spontaneous speech – have been practically unavailable. In this paper, we introduce BEA-Base, a subset of the BEA spoken Hungarian database comprising mostly spontaneous speech of 140 speakers. It is built specifically to assess ASR, primarily for conversational AI applications. After defining the speech recognition subsets and task, several baselines – including classic HMM-DNN hybrid and end-to-end approaches augmented by cross-language transfer learning – are developed using open-source toolkits. The best results obtained are based on multilingual self-supervised pretraining, achieving a 45% recognition error rate reduction as compared to the classical approach – without the application of an external language model or additional supervised data. The results show the feasibility of using BEA-Base for training and evaluation of Hungarian speech recognition systems.

Resources

Details

Paper ID

lrec2022-main-211

Pages

pp. 1970-1977

DOI

10.63317/3gsn6yj7i7oq

BibKey

mihajlik-etal-2022-bea

Editors

Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-38-2

Conference

Thirteenth Language Resources and Evaluation Conference

Location

Marseille, France

Date

20 - 25 June 2022

Authors

PM
Peter Mihajlik
AB
Andras Balog
TG
Tekla Etelka Graczi
AK
Anna Kohari
BT
Balázs Tarján
KM
Katalin Mady

Links

URL

DOI