Back to Main Conference 2022
LREC 2022main

Sentence Selection Strategies for Distilling Word Embeddings from BERT

Proceedings of the Thirteenth International Conference on Language Resources and Evaluation (LREC 2022)

DOI:10.63317/4e4c7ea2pzee

Abstract

Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyse a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.

Details

Paper ID
lrec2022-main-277
Pages
pp. 2591-2600
BibKey
wang-etal-2022-sentence
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-38-2
Conference
Thirteenth Language Resources and Evaluation Conference
Location
Marseille, France
Date
20 June 2022 25 June 2022

Authors

  • YW

    Yixiao Wang

  • ZB

    Zied Bouraoui

  • LE

    Luis Espinosa Anke

  • SS

    Steven Schockaert

Links