Back to Main Conference 2012
LREC 2012main

KALAKA-2: a TV Broadcast Speech Database for the Recognition of Iberian Languages in Clean and Noisy Environments

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/4meroe9f8dgx

Abstract

This paper presents the main features (design issues, recording setup, etc.) of KALAKA-2, a TV broadcast speech database specifically designed for the development and evaluation of language recognition systems in clean and noisy environments. KALAKA-2 was created to support the Albayzin 2010 Language Recognition Evaluation (LRE), organized by the Spanish Network on Speech Technologies from June to November 2010. The database features 6 target languages: Basque, Catalan, English, Galician, Portuguese and Spanish, and includes segments in other (Out-Of-Set) languages, which allow to perform open-set verification tests. The best performance attained in the Albayzin 2010 LRE is presented and briefly discussed. The performance of a state-of-the-art system in various tasks defined on the database is also presented. In both cases, results highlight the suitability of KALAKA-2 as a benchmark for the development and evaluation of language recognition technology.

Details

Paper ID
lrec2012-main-264
Pages
pp. 99-105
BibKey
rodriguez-fuentes-etal-2012-kalaka
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • LR

    Luis Javier Rodríguez-Fuentes

  • MP

    Mikel Penagarikano

  • AV

    Amparo Varona

  • MD

    Mireia Diez

  • GB

    Germán Bordel

Links