Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low-resourced languages and providing competitive results for English, where manually annotated training sets are available. The data is available at trainomatic.org.

Resources

Details

Paper ID

lrec2018-main-268

Pages

N/A

DOI

10.63317/4dxiepzhjumm

BibKey

pasini-etal-2018-huge

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

TP
Tommaso Pasini
FE
Francesco Elia
RN
Roberto Navigli

Links

URL

DOI