Back to Main Conference 2008
LREC 2008main

Quick Rich Transcriptions of Arabic Broadcast News Speech Data

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/5n4f99t2yuhc

Abstract

This paper describes the collect and transcription of a large set of Arabic broadcast news speech data. A total of more than 2000 hours of data was transcribed. The transcription factor for transcribing the broadcast news data has been reduced using a method such as Quick Rich Transcription (QRTR) as well as reducing the number of quality controls performed on the data. The data was collected from several Arabic TV and radio sources and from both Modern Standard Arabic and dialectal Arabic. The orthographic transcriptions included segmentation, speaker turns, topics, sentence unit types and a minimal noise mark-up. The transcripts were produced as a part of the GALE project.

Details

Paper ID
lrec2008-main-124
Pages
N/A
BibKey
bendahman-etal-2008-quick
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • CB

    Chomicha Bendahman

  • MG

    Meghan Glenn

  • DM

    Djamel Mostefa

  • NP

    Niklas Paulsson

  • SS

    Stephanie Strassel

Links