WordKit: a Python Package for Orthographic and Phonological Featurization
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
The modeling of psycholinguistic phenomena, such as word reading, with machine learning techniques requires the featurization of word stimuli into appropriate orthographic and phonological representations. Critically, the choice of features impacts the performance of machine learning algorithms, and can have important ramifications for the conclusions drawn from a model. As such, featurizing words with a variety of feature sets, without having to resort to using different tools is beneficial development. In this work, we present wordkit, a python package which allows users to switch between feature sets and featurizers with a uniform API, allowing for rapid prototyping. To the best of our knowledge, this is the first package which integrates a variety of orthographic and phonological featurizers in a single package. The package is fully compatible with scikit-learn, and hence can be integrated into other pipelines. Furthermore, the package is modular and extensible, allowing for the integration of a large variety of feature sets and featurizers. The package and documentation can be found at github.com/stephantul/wordkit