Back to Main Conference 2012
LREC 2012main

TED-LIUM: an Automatic Speech Recognition dedicated corpus

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/48xafxbrk3qa

Abstract

This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based on the TED Talks. This corpus was built during the IWSLT 2011 Evaluation Campaign, and is composed of 118 hours of speech with its accompanying automatically aligned transcripts. We describe the content of the corpus, how the data was collected and processed, how it will be publicly available and how we built an ASR system using this data leading to a WER score of 17.4 %. The official results we obtained at the IWSLT 2011 evaluation campaign are also discussed.

Details

Paper ID
lrec2012-main-405
Pages
pp. 125-129
BibKey
rousseau-etal-2012-ted
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • AR

    Anthony Rousseau

  • PD

    Paul Deléglise

  • YE

    Yannick Estève

Links