Back to Main Conference 2024
LREC-COLING 2024main

Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4bmrp4966iou

Abstract

Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.

Details

Paper ID
lrec2024-main-1095
Pages
pp. 12514-12526
BibKey
choi-etal-2024-optimizing
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • CC

    ChangSu Choi

  • YJ

    Yongbin Jeong

  • SP

    Seoyoon Park

  • IW

    Inho Won

  • HL

    HyeonSeok Lim

  • SK

    SangMin Kim

  • YK

    Yejee Kang

  • CY

    Chanhyuk Yoon

  • JP

    Jaewan Park

  • YL

    Yiseul Lee

  • HL

    HyeJin Lee

  • YH

    Younggyun Hahm

  • HK

    Hansaem Kim

  • KL

    KyungTae Lim

Links