Back to Main Conference 2014
LREC 2014main

GRASS: the Graz corpus of Read And Spontaneous Speech

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/2zn9iz32di4n

Abstract

This paper provides a description of the preparation, the speakers, the recordings, and the creation of the orthographic transcriptions of the first large scale speech database for Austrian German. It contains approximately 1900 minutes of (read and spontaneous) speech produced by 38 speakers. The corpus consists of three components. First, the Conversation Speech (CS) component contains free conversations of one hour length between friends, colleagues, couples, or family members. Second, the Commands Component (CC) contains commands and keywords which were either read or elicited by pictures. Third, the Read Speech (RS) component contains phonetically balanced sentences and digits. The speech of all components has been recorded at super-wideband quality in a soundproof recording-studio with head-mounted microphones, large-diaphragm microphones, a laryngograph, and with a video camera. The orthographic transcriptions, which have been created and subsequently corrected manually, contain approximately 290 000 word tokens from 15 000 different word types.

Details

Paper ID
lrec2014-main-341
Pages
pp. 1465-1470
BibKey
schuppler-etal-2014-grass
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • BS

    Barbara Schuppler

  • MH

    Martin Hagmueller

  • JM

    Juan A. Morales-Cordovilla

  • HP

    Hannes Pessentheiner

Links