Beyond Accuracy: Analyzing Dialect Confusion in Automatic Speech-Based Dialect Classification

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

Abstract

Automatic dialect classification is commonly treated as a supervised task with a primary focus on overall accuracy. In this paper, we argue that classification errors and model uncertainty provide valuable insights into dialectal structure and variation. We analyze a speech-based dialect classification model trained on German dialect data from three generations and evaluated across 250 speaker-disjoint splits (median weighted F1=0.42). A systematic confusion analysis shows that misclassifications are largely explained by speaker diversity, dialectal similarity, geographical proximity, and speaker self-assessment. Among these factors, the number of speakers per dialect has the strongest impact on performance, while frequent confusions between closely related dialects reflect inherent linguistic similarity rather than model limitations. Generational analyses further indicate that younger speakers exhibit reduced dialectal distinctiveness, although core dialectal features remain shared across generations. By explicitly modeling classification uncertainty, the proposed approach enables the analysis of dialect transition areas and gradient dialect boundaries. Overall, this work demonstrates that automatic dialect classification can serve not only as a predictive task but also as a tool for dialectological analysis.