A Novel Typology of Mutually Intelligible Words: The Case of Slavic Languages

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

Abstract

In this paper, we demonstrate that using the notion of cognate in the task of evaluating mutual intelligibility (MI) in closely related languages can be confusing. We suggest a new term – percipiants – which handles MI of words irrespective of whether they have a common origin. We propose four classes of percipiants,which are differentiated by the degree and type of the closeness of meanings within the word pair. Furthermore, we claim that MI of individual words across a set of languages may be established computationally via normalized Levenshtein Distance (LD). We verify our hypotheses by analyzing data from a psycholinguistic experiment where the respondents were to predict words missing in a text in their native language. In the experimental condition, the respondents had access to the text in their native language and in a language from the same language group (Slavic); in the control group, the respondents were performing the same test in the presence of their native language only. The analysis demonstrates that (a) psycholinguistically, MI may be defined as the difference between the average correctness of answers in the experimental and the control groups; (b) normalized LD may serve as an adequate predictor of the experimentally measured MI; (c) this MI corroborates the four classes of percipiants; (d) contextual factors weaken the predictive force of LD, requiring further investigation.