Back to Main Conference 2014
LREC 2014main

CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/3fcczfqy3gqw

Abstract

This paper describes the CORILGA (“Corpus Oral Informatizado da Lingua Galega”). CORILGA is a large high-quality corpus of spoken Galician from the 1960s up to present-day, including both formal and informal spoken language from both standard and non-standard varieties, and across different generations and social levels. The corpus will be available to the research community upon completion. Galician is one of the EU languages that needs further research before highly effective language technology solutions can be implemented. A software repository for speech resources in Galician is also described. The repository includes a structured database, a graphical interface and processing tools. The use of a database enables to perform search in a simple and fast way based in a number of different criteria. The web-based user interface facilitates users the access to the different materials. Last but not least a set of transcription-based modules for automatic speech recognition has been developed, thus facilitating the orthographic labelling of the recordings.

Details

Paper ID
lrec2014-main-579
Pages
pp. 2653-2657
BibKey
garcia-mateo-etal-2014-corilga
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • CG

    Carmen García-Mateo

  • AC

    Antonio Cardenal

  • XR

    Xosé Luis Regueira

  • ER

    Elisa Fernández Rei

  • MM

    Marta Martinez

  • RS

    Roberto Seara

  • RV

    Rocío Varela

  • NB

    Noemí Basanta

Links