Back to Main Conference 2006
LREC 2006main

A translated corpus of 30,000 French SMS

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/5ccdvurbsrso

Abstract

The development of communication technologies has contributed to the appearance of new forms in the written language that scientists have to study according to their peculiarities (typing or viewing constraints, synchronicity, etc). In the particular case of SMS (Short Message Service), studies are complicated by a lack of data, mainly due to technical constraints and privacy considerations. In this paper, we present a corpus of 30,000 French SMS collected through a project in Belgium named “Faites don de vos SMS à la science” (Give your SMS to Science). This corpus is unique in its quality, its size and the fact that the SMS have been manually translated into “standard” French. We will first describe the collection process and discuss the writers' profiles. Then we will explain in detail how the translation was carried out.

Details

Paper ID
lrec2006-main-148
Pages
N/A
BibKey
fairon-paumier-2006-translated
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • CF

    Cédrick Fairon

  • SP

    Sébastien Paumier

Links