Back to Main Conference 2004
LREC 2004main
Using a Parallel Transcript/Subtitle Corpus for Sentence Compression
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)
Abstract
The paper describes the construction and usage of a parallel corpus consisting of transcripts of television programs on the one hand and subtitles of those television programs on the other hand. The subtitles were targeted at hearing-impaired people. They are in the same language as the television programs (Dutch). Our goal is to convert transcripts to subtitles. We will apply the corpus for learning how to perform sentence compression in much the same way as Jing (2001).