Back to Main Conference 2014
LREC 2014main

The RATS Collection: Supporting HLT Research with Degraded Audio Data

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3khwb2byj8dr

Abstract

The DARPA RATS program was established to foster development of language technology systems that can perform well on speaker-to-speaker communications over radio channels that evince a wide range in the type and extent of signal variability and acoustic degradation. Creating suitable corpora to address this need poses an equally wide range of challenges for the collection, annotation and quality assessment of relevant data. This paper describes the LDC’s multi-year effort to build the RATS data collection, summarizes the content and properties of the resulting corpora, and discusses the novel problems and approaches involved in ensuring that the data would satisfy its intended use, to provide speech recordings and annotations for training and evaluating HLT systems that perform 4 specific tasks on difficult radio channels: Speech Activity Detection (SAD), Language Identification (LID), Speaker Identification (SID) and Keyword Spotting (KWS).

Details

Paper ID
lrec2014-main-089
Pages
pp. 1970-1977
BibKey
graff-etal-2014-rats
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • DG

    David Graff

  • KW

    Kevin Walker

  • SS

    Stephanie Strassel

  • XM

    Xiaoyi Ma

  • KJ

    Karen Jones

  • AS

    Ann Sawyer

Links