A Corpus of Native, Non-native and Translated Texts

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

We describe a monolingual English corpus of original and (human) translated texts, with an accurate annotation of speaker properties, including the original language of the utterances and the speaker's country of origin. We thus obtain three sub-corpora of texts reflecting native English, non-native English, and English translated from a variety of European languages. This dataset will facilitate the investigation of similarities and differences between these kinds of sub-languages. Moreover, it will facilitate a unified comparative study of translations and language produced by (highly fluent) non-native speakers, two closely-related phenomena that have only been studied in isolation so far.

Resources

Details

Paper ID

lrec2016-main-664

Pages

pp. 4197-4201

DOI

10.63317/3pjm6i8bptfu

BibKey

nisioi-etal-2016-corpus

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

SN
Sergiu Nisioi
ER
Ella Rabinovich
LD
Liviu P. Dinu
SW
Shuly Wintner

Links

URL

DOI