Two Architectures for Parallel Processing of Huge Amounts of Text

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing. The two architectures focus on different processing scenarios, namely batch-processing and streaming processing. The batch-processing scenario aims at optimizing the overall throughput of the system, i.e., minimizing the overall time spent on processing all documents. The streaming architecture aims to minimize the time to process real-time incoming documents and is therefore especially suitable for live feeds. The paper presents experiments with both architectures, and reports the overall gain when they are used for batch as well as for streaming processing. All the software described in the paper is publicly available under free licenses.

Resources

Details

Paper ID

lrec2016-main-714

Pages

pp. 4513-4519

DOI

10.63317/2jjts7anyufg

BibKey

kattenberg-etal-2016-two

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

MK
Mathijs Kattenberg
ZB
Zuhaitz Beloki
AS
Aitor Soroa
XA
Xabier Artola
AF
Antske Fokkens
PH
Paul Huygen
KV
Kees Verstoep

Links

URL

DOI