Recruitment Techniques for Minority Language Speech Databases: Some Observations
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)
Abstract
This paper describes the collection efforts for SpeechDat Cymru, a 2000-speaker database for Welsh, a minority language spoken by about 500,000 of the Welsh population. The database is part of the SpeechDat(II) project. General database details are discussed insofar as they affect recruitment strategies, and likely differences between minority language spoken language resource (SLR) and general SLR collection are noted. Individual recruitment techniques are then detailed, with an indication of their relative successes and relevance to minority language SLR collection generally. It is observed that no one technique was sufficient to collect the entire database, and that those techniques involving face-to-face recruitment by an individual closely involved with the database collection produced the best yields for effort expended. More traditional postal recruitment techniques were less successful. The experiences during collection underlined the importance of utilising enthusiastic recruiters, and taking advantage of the speaker networks present in the community.