Back to Main Conference 2002
LREC 2002main
Proposal of a very-large-corpus acquisition method by cell-formed registration
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)
Abstract
One promising way to improve the performance of a speech translation system is to collect a large volume of data in the target tasks/domains. However, a naïve expansion of the traditional data collection scheme consumes valuable resources. Advanced speech recognition technology can provide a highly accurate recognizer if a machine-friendly speech is permitted. We propose a new data collection scheme that is supported by this speaking style. The preliminary results of data collection show that the proposed scheme has a three-digit efficiency.