Structured Partial Predictability in Non-Concatenative Morphology: The Case of Tashlhiyt Berber

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

Abstract

Non-concatenative morphology poses a persistent challenge for NLP, yet structured quantitative resources for Amazigh (Berber) languages remain scarce. We present the first large-scale computational study of Tashlhiyt Berber plural formation, drawing on a richly annotated dataset of 1,185 noun paradigms with phonological, morphological and semantic features. We decompose the plural system into macro-level word-formation strategies and micro-level stem mutations, and evaluate predictability across ten target domains using linguistic feature models, N-gram baselines, and Bi-LSTM neural models. Results reveal a structured split: linguistic features decisively outperform neural models on systematic macro-level strategies (e.g., +44.5pp F1), while Bi-LSTMs better capture lexically idiosyncratic patterns. Rather than supporting a categorical rule/memory divide, this complementarity reveals gradient layers of regularity within a single morphological system. These findings demonstrate the value of linguistically informed annotation for probing morphological complexity in low-resource, typologically diverse languages. All data, code, and models are publicly available.

Resources

Details

Paper ID

lrec2026-ws-slide-11

Pages

pp. 124-135

DOI

10.63317/32kvdo5hchjo

BibKey

alderete-etal-2026-structured

Editors

Germany) Erhard Hinrichs (Tübingen University, Sweden) Joakim Nivre (Uppsala University, Bulgaria) Petya Osenova (Sofia University, USA) James Pustejovsky (Brandeis University, Germany) Claus Zinn (Tübingen University

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the Workshop on Structured Linguistic Data and Evaluation (SLiDE)

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

JA
John Alderete
HS
Hamza Sellami

Links

URL

DOI