Back to Main Conference 2016
LREC 2016main

The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2w9ohr48zi4a

Abstract

We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets. This benchmark enables the evaluation of parser robustness as well as text normalization methods, including normalization as machine translation and unsupervised lexical normalization, directly on syntactic trees. Experiments show that text normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy on this test set.

Details

Paper ID
lrec2016-main-102
Pages
pp. 649-653
BibKey
daiber-van-der-goot-2016-denoised
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • JD

    Joachim Daiber

  • Rv

    Rob van der Goot

Links