Back to Main Conference 2016
LREC 2016main
Comparing the Level of Code-Switching in Corpora
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Abstract
Social media texts are often fairly informal and conversational, and when produced by bilinguals tend to be written in several different languages simultaneously, in the same way as conversational speech. The recent availability of large social media corpora has thus also made large-scale code-switched resources available for research. The paper addresses the issues of evaluation and comparison these new corpora entail, by defining an objective measure of corpus level complexity of code-switched texts. It is also shown how this formal measure can be used in practice, by applying it to several code-switched corpora.