Back to Main Conference 2022
LREC 2022main

Korean Language Modeling via Syntactic Guide

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/2tmk36btotak

Abstract

While pre-trained language models play a vital role in modern language processing tasks, but not every language can benefit from them. Most existing research on pre-trained language models focuses primarily on widely-used languages such as English, Chinese, and Indo-European languages. Additionally, such schemes usually require extensive computational resources alongside a large amount of data, which is infeasible for less-widely used languages. We aim to address this research niche by building a language model that understands the linguistic phenomena in the target language which can be trained with low-resources. In this paper, we discuss Korean language modeling, specifically methods for language representation and pre-training methods. With our Korean-specific language representation, we are able to build more powerful language models for Korean understanding, even with fewer resources. The paper proposes chunk-wise reconstruction of the Korean language based on a widely used transformer architecture and bidirectional language representation. We also introduce morphological features such as Part-of-Speech (PoS) into the language understanding by leveraging such information during the pre-training. Our experiment results prove that the proposed methods improve the model performance of the investigated Korean language understanding tasks.

Details

Paper ID
lrec2022-main-304
Pages
pp. 2841-2849
BibKey
kim-etal-2022-korean
Editors
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis2020
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 - 25 June 2022

Authors

  • HK

    Hyeondey Kim

  • SK

    Seonhoon Kim

  • IK

    Inho Kang

  • NK

    Nojun Kwak

  • PF

    Pascale Fung

Links