Back to Main Conference 2016
LREC 2016main

A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/5ge78oxtpbtf

Abstract

The paper describes a comparative study of existing and novel text preprocessing and classification techniques for domain detection of user utterances. Two corpora are considered. The first one contains customer calls to a call centre for further call routing; the second one contains answers of call centre employees with different kinds of customer orientation behaviour. Seven different unsupervised and supervised term weighting methods were applied. The collective use of term weighting methods is proposed for classification effectiveness improvement. Four different dimensionality reduction methods were applied: stop-words filtering with stemming, feature selection based on term weights, feature transformation based on term clustering, and a novel feature transformation method based on terms belonging to classes. As classification algorithms we used k-NN and a SVM-based algorithm. The numerical experiments have shown that the simultaneous use of the novel proposed approaches (collectives of term weighting methods and the novel feature transformation method) allows reaching the high classification results with very small number of features.

Details

Paper ID
lrec2016-main-288
Pages
pp. 1826-1831
BibKey
sergienko-etal-2016-comparative
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • RS

    Roman Sergienko

  • MS

    Muhammad Shan

  • WM

    Wolfgang Minker

Links