Back to Main Conference 2018
LREC 2018main

A Fast and Accurate Vietnamese Word Segmenter

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/32m47vrkp2wj

Abstract

We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.

Details

Paper ID
lrec2018-main-410
Pages
N/A
BibKey
nguyen-etal-2018-fast
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • DN

    Dat Quoc Nguyen

  • DN

    Dai Quoc Nguyen

  • TV

    Thanh Vu

  • MD

    Mark Dras

  • MJ

    Mark Johnson

Links