Back to Main Conference 2016
LREC 2016main

Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/346oz8gtzhz6

Abstract

Wikipedia, the well known internet encyclopedia, is nowadays a widely used source of information. To leverage its rich information, already parsed versions of Wikipedia have been proposed. We present an annotated dump of the English Wikipedia. This dump draws upon previously released Wikipedia parsed dumps. Still, we head in a different direction. In this parse we focus more into the syntactical characteristics of words: aside from the classical Part-of-Speech (PoS) tags and dependency parsing relations, we provide the full constituent parse branch for each word in a succinct way. Additionally, we propose a hypergraph network representation of the extracted linguistic information. The proposed modelization aims to take advantage of the information stocked within our parsed Wikipedia dump. We hope that by releasing these resources, researchers from the concerned communities will have a ready-to-experiment Wikipedia corpus to compare and distribute their work. We render public our parsed Wikipedia dump as well as the tool (and its source code) used to perform the parse. The hypergraph network and its related metadata is also distributed.

Details

Paper ID
lrec2016-main-372
Pages
pp. 2348-2353
BibKey
morales-etal-2016-hypergraph
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • EM

    Edmundo Pavel Soriano Morales

  • JA

    Julien Ah-Pine

  • SL

    Sabine Loudcher

Links