Challenges in the Detection of Dialect for Historical Languages; the Case of Old Irish Text Resources
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Abstract
Old Irish presents particular challenges for the study of automatic dialect detection. It is generally accepted that Old Irish presents little trace of dialect. Extant Old Irish text resources introduce a considerable amount of extra variation, which could impact dialect identification applications. While some scholarship has suggested that certain features may be indicative of dialect, such hypotheses are difficult to substantiate where authorship is anonymous, or where the text itself is not associated with a particular geographical region. This paper describes the application of stylometric dialect detection techniques to Old Irish texts, and discusses the features which emerge from this process as potential markers of dialect. The aim is not necessarily to identify Old Irish dialectal features, but rather to investigate the impact that Old Irish text resources could have on such applications. This paper does, however, add to the extant body of research by highlighting some features which might be identified as stylistically distinct by stylometric dialect identification techniques.