Back to Main Conference 2018
LREC 2018main

Advances in Pre-Training Distributed Word Representations

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/4b3prw5a5tze

Abstract

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.

Details

Paper ID
lrec2018-main-008
Pages
N/A
BibKey
mikolov-etal-2018-advances
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • TM

    Tomas Mikolov

  • EG

    Edouard Grave

  • PB

    Piotr Bojanowski

  • CP

    Christian Puhrsch

  • AJ

    Armand Joulin

Links