Back to Main Conference 2018
LREC 2018main

Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/2sgmsme8ccj4

Abstract

For the purpose of POS tagging noisy user-generated text, should normalization be handled as a preliminary task or is it possible to handle misspelled words directly in the POS tagging model? We propose in this paper a combined approach where some errors are normalized before tagging, while a Gated Recurrent Unit deep neural network based tagger handles the remaining errors. Word embeddings are trained on a large corpus in order to address both normalization and POS tagging. Experiments are run on Contact Center chat conversations, a particular type of formal Computer Mediated Communication data.

Details

Paper ID
lrec2018-main-014
Pages
N/A
BibKey
damnati-etal-2018-handling
Editors
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 - 12 May 2018

Authors

  • GD

    Géraldine Damnati

  • JA

    Jeremy Auguste

  • AN

    Alexis Nasr

  • DC

    Delphine Charlet

  • JH

    Johannes Heinecke

  • FB

    Frédéric Béchet

Links