Back to Main Conference 2018
LREC 2018main
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
For the purpose of POS tagging noisy user-generated text, should normalization be handled as a preliminary task or is it possible to handle misspelled words directly in the POS tagging model? We propose in this paper a combined approach where some errors are normalized before tagging, while a Gated Recurrent Unit deep neural network based tagger handles the remaining errors. Word embeddings are trained on a large corpus in order to address both normalization and POS tagging. Experiments are run on Contact Center chat conversations, a particular type of formal Computer Mediated Communication data.