Title

Using a Parallel Transcript/Subtitle Corpus for Sentence Compression

Author(s)

Vincent Vandeghinste (1), Erik Tjong Kim Sang (2)

(1) Centre for Computational Linguistics, KULeuven, Belgium; (2) CNTS, University of Antwerp, Belgium

Session

P1-W

Abstract

The paper describes the construction and usage of a parallel corpus consisting of transcripts of television programs on the one hand and subtitles of those television programs on the other hand. The subtitles were targeted at hearing-impaired people. They are in the same language as the television programs (Dutch). Our goal is to convert transcripts to subtitles. We will apply the corpus for learning how to perform sentence compression in much the same way as Jing (2001).

Keyword(s)

subtitling, sentence compression, sentence alignment, hearing-impaired, sentence reduction

Language(s) Dutch
Full Paper

128.pdf