Back to Main Conference 2014
LREC 2014main

Revising the annotation of a Broadcast News corpus: a linguistic approach

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/4mr4k4h5iduj

Abstract

This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.

Details

Paper ID
lrec2014-main-017
Pages
pp. 3908-3913
BibKey
cabarrao-etal-2014-revising
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • VC

    Vera Cabarrão

  • HM

    Helena Moniz

  • FB

    Fernando Batista

  • RR

    Ricardo Ribeiro

  • NM

    Nuno Mamede

  • HM

    Hugo Meinedo

  • IT

    Isabel Trancoso

  • AM

    Ana Isabel Mata

  • Dd

    David Martins de Matos

Links