Back to Main Conference 2004
LREC 2004main

Bilingual Connections for Trilingual Corpora: An XML Approach

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/5cvjyzt9j2wd

Abstract

This paper describes the design and development of a trilingual spontaneous speech corpus for statistical speech-to-speech translation. The languages considered are Catalan, Spanish and US-English. This corpus has been built bearing in mind the strong need for multilingual collections of on-line data within the area of statistical translation, as well as the need for data that are parallel or aligned, that contain different types of linguistic information and that can be used by diferent translation systems. For that reason, our aim has been the creation of a linguistically-enriched resource with an XML-based DTD that allows a useful, transparent and flexible storage of the data. Moreover, these resources are also valuable for a wide range of Natural Language Processing applications, such as multilingual resource acquisition or word sense discrimination, among others.

Details

Paper ID
lrec2004-main-412
Pages
N/A
BibKey
arranz-etal-2004-bilingual
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • VA

    Victoria Arranz

  • NC

    Núria Castell

  • JC

    Josep Maria Crego

  • JG

    Jesús Giménez

  • Ad

    Adrià de Gispert

  • PL

    Patrik Lambert

Links