Unsupervised Labelling of Mutation Triggers in Welsh
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
Initial consonant mutation is a key feature of Welsh, but its complexity poses significant challenges for both language learners and natural language processing (NLP) systems. While existing tools can reliably detect mutated forms, they provide no information about why a mutation occurs, i.e. what grammatical or lexical factors trigger the change. This paper introduces the novel task of mutation trigger labelling, representing the first computational attempt to analyse and explain the reasons behind Welsh mutations. Two preliminary approaches are explored: (i) a linguistically-informed rule-based system integrating Constraint Grammar rules, and (ii) large language models (LLMs), prompted in few-shot settings. Our experiments test the feasibility of automatically identifying and labelling linguistic triggers behind Welsh mutations using a dataset constructed from grammar reference books and public corpora, and establish baseline insights into how context-aware mutation analysis can be achieved. By framing mutation trigger labelling as a linguistic computational problem, this work lays important groundwork within Welsh NLP and contributes to the broader development of explainable grammatical analysis for low-resource languages.