Back to Main Conference 2016
LREC 2016main

Punctuation Prediction for Unsegmented Transcript Based on Word Vector

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/39kpgdnbwmrw

Abstract

In this paper we propose an approach to predict punctuation marks for unsegmented speech transcript. The approach is purely lexical, with pre-trained Word Vectors as the only input. A training model of Deep Neural Network (DNN) or Convolutional Neural Network (CNN) is applied to classify whether a punctuation mark should be inserted after the third word of a 5-words sequence and which kind of punctuation mark the inserted one should be. TED talks within IWSLT dataset are used in both training and evaluation phases. The proposed approach shows its effectiveness by achieving better result than the state-of-the-art lexical solution which works with same type of data, especially when predicting puncuation position only.

Details

Paper ID
lrec2016-main-103
Pages
pp. 654-658
BibKey
che-etal-2016-punctuation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • XC

    Xiaoyin Che

  • CW

    Cheng Wang

  • HY

    Haojin Yang

  • CM

    Christoph Meinel

Links