Back to Main Conference 2006
LREC 2006main

Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/4yi2fj2u6ohf

Abstract

This paper presents the audio corpus developed in the framework of the ESTER evaluation campaign of French broadcast news transcription systems. This corpus includes 100 hours of manually annotated recordings and 1,677 hours of non transcribed data. The manual annotations include the detailed verbatim orthographic transcription, the speaker turns and identities, information about acoustic conditions, and name entities. Additional resources generated by automatic speech processing systems, such as phonetic alignments and word graphs, are also described.

Details

Paper ID
lrec2006-main-397
Pages
N/A
BibKey
galliano-etal-2006-corpus
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • SG

    S. Galliano

  • EG

    E. Geoffrois

  • GG

    G. Gravier

  • JB

    J.-F. Bonastre

  • DM

    D. Mostefa

  • KC

    K. Choukri

Links