Back to Main Conference 2014
LREC 2014main

Sentence Rephrasing for Parsing Sentences with OOV Words

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/2nmx9hqvigms

Abstract

This paper addresses the problems of out-of-vocabulary (OOV) words, named entities in particular, in dependency parsing. The OOV words, whose word forms are unknown to the learning-based parser, in a sentence may decrease the parsing performance. To deal with this problem, we propose a sentence rephrasing approach to replace each OOV word in a sentence with a popular word of the same named entity type in the training set, so that the knowledge of the word forms can be used for parsing. The highest-frequency-based rephrasing strategy and the information-retrieval-based rephrasing strategy are explored to select the word to replace, and the Chinese Treebank 6.0 (CTB6) corpus is adopted to evaluate the feasibility of the proposed sentence rephrasing strategies. Experimental results show that rephrasing some specific types of OOV words such as Corporation, Organization, and Competition increases the parsing performances. This methodology can be applied to domain adaptation to deal with OOV problems.

Details

Paper ID
lrec2014-main-485
Pages
pp. 2859-2862
BibKey
huang-etal-2014-sentence
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • HH

    Hen-Hsen Huang

  • HC

    Huan-Yuan Chen

  • CY

    Chang-Sheng Yu

  • HC

    Hsin-Hsi Chen

  • PL

    Po-Ching Lee

  • CC

    Chun-Hsun Chen

Links