Back to Main Conference 2014
LREC 2014main
Morphological parsing of Swahili using crowdsourced lexical resources
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)
Abstract
We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources. Our analyzer was supplemented extensively with data from the Kamusi Project (kamusi.org), a user-contributed multilingual dictionary. Making use of this resource allowed us to achieve wide lexical coverage quickly, but the heterogeneous nature of user-contributed content also poses some challenges when adapting it for use in an expert system.