PyRATA, Python Rule-based feAture sTructure Analysis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
In this paper, we present a new Python 3 module named PyRATA, which stands for ”Python Rule-based feAture sTructure Analysis”. The module is released under the Apache V2 license. It aims at supporting rules-based analysis on structured data. PyRATA offers a language expressiveness which covers the functionalities of all the concurrent modules and more. Designed to be intuitive, the pattern syntax and the engine API follow existing standard definitions; Respectively Perl regular expression syntax and Python re module API. Using a simple native Python data structure (i.e. sequence of feature sets) allows it to deal with various kinds of data (textual or not) at various levels, such as a list of words, a list of sentences, a list of posts of a forum thread, a list of events of a calendar... This specificity makes it free from any (linguistic) process.