Back to Main Conference 2008
LREC 2008main

The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3vpjfgnpmv83

Abstract

This paper describes the corpus of university lectures that has been recorded in European Portuguese, and some of the recognition experiments we have done with it. The highly specific topic domain and the spontaneous speech nature of the lectures are two of the most challenging problems. Lexical and language model adaptation proved difficult given the scarcity of domain material in Portuguese, but improvements can be achieved with unsupervised acoustic model adaptation. From the point of view of the study of spontaneous speech characteristics, namely disflluencies, the LECTRA corpus has also proved a very valuable resource.

Details

Paper ID
lrec2008-main-501
Pages
N/A
BibKey
trancoso-etal-2008-lectra
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • IT

    Isabel Trancoso

  • RM

    Rui Martins

  • HM

    Helena Moniz

  • AM

    Ana Isabel Mata

  • MV

    M. Céu Viana

Links