Back to Main Conference 2018
LREC 2018main

Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/42yv7aekrt37

Abstract

Following the dynamics of several recent crowdsourcing projects with the aim of collecting linguistic data, this paper focuses on such a project in the field of Swiss German dialects and Swiss French accents. The main scientific goal of the data collected is to understand people’s perception of dialects and accents, and provide a resource for future computational systems such as automatic dialect recognition. A gamified crowdsourcing platform was set up and launched for both main locales of Switzerland: “din dialäkt” (‘your dialect’) for Swiss German dialects and “ton accent” (‘your accent’) for Swiss French. The main activity for the participant is to localize preselected audio samples by clicking on a map of Switzerland. The media was highly interested in the two platforms and many reports appeared in newspapers, television and radio, which increased the public’s awareness of the project and thus also the traffic on the page. At this point of the project, 7,500 registered users (beside 30,000 anonymous visitors), have provided 470,000 localizations. By connecting user’s results of this localization task to their socio-demographic information, a quantitative analysis of the localization data can reveal which factors play a role in their performance. Preliminary results showed that age and childhood residence influence the how well dialects/accents are recognized. Nevertheless, quantity does not ensure quality when it comes to data. Crowdsourcing such linguistic data revealed traps to avoid such as scammers, or the participants’ quick loss of motivation causing them to click randomly. Such obstacles need to be taken into account when assessing the reliability of data and require a number of preliminary steps before an analysis of the data.

Details

Paper ID
lrec2018-main-234
Pages
N/A
BibKey
goldman-etal-2018-strategies
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • JG

    Jean-Philippe Goldman

  • SC

    Simon Clematide

  • MA

    Mathieu Avanzi

  • RT

    Raphael Tandler

Links