Back to Main Conference 2024
LREC-COLING 2024main

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4h8nkuscdqqa

Abstract

Memory is one of the most essential cognitive functions serving as a repository of world knowledge and episodes of activities. In recent years, large-scale pre-trained language models have shown remarkable memorizing ability. On the contrary, vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem. To investigate such a retentive-forgetful contradiction and understand the memorizing dynamic mechanism of language models, we conduct thorough experiments by controlling the target knowledge types, the learning strategies and the learning schedules. We find that: 1) Vanilla language models without pre-training are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation. These conclusions are useful for understanding the abilities of pre-trained language models and shed light on designing and evaluating new learning and inference algorithms of language models.

Details

Paper ID
lrec2024-main-1222
Pages
pp. 14016-14036
BibKey
cao-etal-2024-retentive
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • BC

    Boxi Cao

  • QT

    Qiaoyu Tang

  • HL

    Hongyu Lin

  • SJ

    Shanshan Jiang

  • BD

    Bin Dong

  • XH

    Xianpei Han

  • JC

    Jiawei Chen

  • TW

    Tianshu Wang

  • LS

    Le Sun

Links