A Large DataBase of Hypernymy Relations Extracted from the Web.

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

Hypernymy relations (those where an hyponym term shares a "isa" relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g.\ ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

Resources

Details

Paper ID

lrec2016-main-056

Pages

pp. 360-367

DOI

10.63317/2aqojnt6nje4

BibKey

seitner-etal-2016-large

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

JS
Julian Seitner
CB
Christian Bizer
KE
Kai Eckert
SF
Stefano Faralli
RM
Robert Meusel
HP
Heiko Paulheim
SP
Simone Paolo Ponzetto

Links

URL

DOI