Back to MWE 2024
LREC-COLING 2024workshop

Automatic Manipulation of Training Corpora to Make Parsers Accept Real-world Text

Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

DOI:10.63317/5jyw7zms6o92

Abstract

This paper discusses how to build a practical syntactic analyzer, and addresses the distributional differences between existing corpora and actual documents in applications. As a case study we focus on noun phrases that are not headed by a main verb and sentences without punctuation at the end, which are rare in a number of Universal Dependencies corpora but frequently appear in the real-world use cases of syntactic parsers. We converted the training corpora so that their distribution is closer to that in realistic inputs, and obtained the better scores both in general syntax benchmarking and a sentiment detection task, a typical application of dependency analysis.

Details

Paper ID
lrec2024-ws-mwe-03
Pages
pp. 4-13
BibKey
kanayama-etal-2024-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Location
undefined, undefined
Date
20 May 2024 25 May 2024

Authors

  • HK

    Hiroshi Kanayama

  • RI

    Ran Iwamoto

  • MM

    Masayasu Muraoka

  • TO

    Takuya Ohko

  • KM

    Kohtaroh Miyamoto

Links