Advances in Pre-Training Distributed Word Representations

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.

Resources

Details

Paper ID

lrec2018-main-008

Pages

N/A

DOI

10.63317/4b3prw5a5tze

BibKey

mikolov-etal-2018-advances

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

TM
Tomas Mikolov
EG
Edouard Grave
PB
Piotr Bojanowski
CP
Christian Puhrsch
AJ
Armand Joulin

Links

URL

DOI