Back to Main Conference 2008
LREC 2008main

Morphosyntactic Resources for Automatic Speech Recognition

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/39m9kmdge38a

Abstract

Texts generated by automatic speech recognition (ASR) systems have some specificities, related to the idiosyncrasies of oral productions or the principles of ASR systems, that make them more difficult to exploit than more conventional natural language written texts. This paper aims at studying the interest of morphosyntactic information as a useful resource for ASR. We show the ability of automatic methods to tag outputs of ASR systems, by obtaining a tag accuracy similar for automatic transcriptions to the 95-98 % usually reported for written texts, such as newspapers. We also demonstrate experimentally that tagging is useful to improve the quality of transcriptions by using morphosyntactic information in a post-processing stage of speech decoding. Indeed, we obtain a significant decrease of the word error rate with experiments done on French broadcast news from the ESTER corpus; we also notice an improvement of the sentence error rate and observe that a significant number of agreement errors are corrected.

Details

Paper ID
lrec2008-main-496
Pages
N/A
BibKey
huet-etal-2008-morphosyntactic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • SH

    Stéphane Huet

  • GG

    Guillaume Gravier

  • PS

    Pascale Sébillot

Links