Discriminating Similar Languages: Evaluations and Explorations

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using ensemble and oracle combination, and provide learning curves to help us understand which languages are more challenging. A number of difficult sentences are identified and investigated further with human annotation

Resources

Details

Paper ID

lrec2016-main-284

Pages

pp. 1800-1807

DOI

10.63317/4e2puhxuby8y

BibKey

goutte-etal-2016-discriminating

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

CG
Cyril Goutte
SL
Serge Léger
SM
Shervin Malmasi
MZ
Marcos Zampieri

Links

URL

DOI