Back to Main Conference 2010
LREC 2010main

Language Modeling Approach for Retrieving Passages in Lecture Audio Data

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/44vb52iorvr3

Abstract

Spoken Document Retrieval (SDR) is a promising technology for enhancing the utility of spoken materials. After the spoken documents have been transcribed by using a Large Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be applied directly to the transcribed documents. However, recognition errors will significantly degrade the retrieval performance. To address this problem, we have previously proposed a method that aimed to fill the gap between automatically transcribed text and correctly transcribed text by using a statistical translation technique. In this paper, we extend the method by (1) using neighboring context to index the target passage, and (2) applying a language modeling approach for document retrieval. Our experimental evaluation shows that context information can improve retrieval performance, and that the language modeling approach is effective in incorporating context information into the proposed SDR method, which uses a translation model.

Details

Paper ID
lrec2010-main-319
Pages
N/A
BibKey
honda-akiba-2010-language
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • KH

    Koichiro Honda

  • TA

    Tomoyosi Akiba

Links