Summary of the paper

Title Thai Broadcast News Corpus Construction and Evaluation
Authors Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Koji Iwano and Sadaoki Furui
Abstract Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of speech data while the text corpus was transcribed from around 35 hours of television broadcast news. The characteristics of the corpus were analyzed and shown in the paper. The speech corpus was split according to the evaluation focus condition used in the DARPA Hub-4 evaluation. An 18K-word Thai speech recognition system was setup to test with this speech corpus as a preliminary experiment. Acoustic model adaptations were performed to improve the system performance. The best system yielded a word error rate of about 20% for clean and planned speech, and below 30% for the overall condition.
Language Single language
Topics Speech resource/database, Speech recognition and understanding, Corpus (creation, annotation, etc.)
Full paper Thai Broadcast News Corpus Construction and Evaluation
Slides Thai Broadcast News Corpus Construction and Evaluation
Bibtex @InProceedings{JONGTAVEESATAPORN08.319,
  author = {Markpong Jongtaveesataporn, Chai Wutiwiwatchai, Koji Iwano and Sadaoki Furui},
  title = {Thai Broadcast News Corpus Construction and Evaluation},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA