Summary of the paper

Title The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese
Authors Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata and M. Céu Viana
Abstract This paper describes the corpus of university lectures that has been recorded in European Portuguese, and some of the recognition experiments we have done with it. The highly specific topic domain and the spontaneous speech nature of the lectures are two of the most challenging problems. Lexical and language model adaptation proved difficult given the scarcity of domain material in Portuguese, but improvements can be achieved with unsupervised acoustic model adaptation. From the point of view of the study of spontaneous speech characteristics, namely disflluencies, the LECTRA corpus has also proved a very valuable resource.
Language
Topics Speech resource/database, Speech recognition and understanding, Corpus (creation, annotation, etc.)
Full paper The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese
Slides -
Bibtex @InProceedings{TRANCOSO08.359,
  author = {Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata and M. Céu Viana},
  title = {The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA