Back to Main Conference 2018
LREC 2018main

E-magyar – A Digital Language Processing System

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/2x92buwbygd5

Abstract

e-magyar is a new toolset for the analysis of Hungarian texts. It was produced as a collaborative effort of the Hungarian language technology community integrating the best state of the art tools, enhancing them where necessary, making them interoperable and releasing them with a clear license. It is a free, open, modular text processing pipeline which is integrated in the GATE system offering further prospects of interoperability. From tokenizing to parsing and named entity recognition, existing tools were examined and those selected for integration underwent various amount of overhaul in order to operate in the pipeline with a uniform encoding, and run in the same Java platform. The tokenizer was re-built from ground up and the flagship module, the morphological analyzer, based on the Humor system, was given a new annotation system and was implemented in the HFST framework. The system is aimed for a broad range of users, from language technology application developers to digital humanities researchers alike. It comes with a drag-and-drop demo on its website: http://e-magyar.hu/en/.

Details

Paper ID
lrec2018-main-208
Pages
N/A
BibKey
varadi-etal-2018-e
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • TV

    Tamás Váradi

  • ES

    Eszter Simon

  • BS

    Bálint Sass

  • IM

    Iván Mittelholcz

  • AN

    Attila Novák

  • BI

    Balázs Indig

  • RF

    Richárd Farkas

  • VV

    Veronika Vincze

Links