Back to Main Conference 2026
LREC 2026main

Evaluating Phonetically Weighted and Unweighted Distance Measures in Dialectometry

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/38ndhg759wui

Abstract

This paper compares phonetically weighted and unweighted string distance measures in dialectometry, examining how explicit phonetic modeling affects the quantitative representation of linguistic similarity. Using narrow IPA transcriptions from the German REDE corpus, we evaluate nine measures–Levenshtein distance, bigram and trigram overlap, cosine distance, Jaro-Winkler, Jaccard similarity, the Herrgen-Schmidt measure, and the Relative Identity Value–through correlational analysis, distributional comparison, stabilization testing, and multidimensional scaling. The phonetically weighted Herrgen-Schmidt measure consistently achieves the most balanced distance dispersion, earliest stabilization, and highest linguistic plausibility. Unweighted edit-based measures reproduce the same topological structure in compressed form; distributional and overlap-based metrics introduce systematic scale distortions through exaggeration or compression. These findings establish explicit phonetic weighting as a principled and analytically efficient extension of standard dialectometric procedures. Explicit phonetic weighting enhances resolution and interpretive precision without altering the underlying relational geometry of dialect classifications.

Details

Paper ID
lrec2026-main-327
Pages
pp. 4141-4151
BibKey
lameli-2026-evaluating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • AL

    Alfred Lameli

Links