Back to Main Conference 2016
LREC 2016main

Two Architectures for Parallel Processing of Huge Amounts of Text

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2jjts7anyufg

Abstract

This paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallel processing. The two architectures focus on different processing scenarios, namely batch-processing and streaming processing. The batch-processing scenario aims at optimizing the overall throughput of the system, i.e., minimizing the overall time spent on processing all documents. The streaming architecture aims to minimize the time to process real-time incoming documents and is therefore especially suitable for live feeds. The paper presents experiments with both architectures, and reports the overall gain when they are used for batch as well as for streaming processing. All the software described in the paper is publicly available under free licenses.

Details

Paper ID
lrec2016-main-714
Pages
pp. 4513-4519
BibKey
kattenberg-etal-2016-two
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • MK

    Mathijs Kattenberg

  • ZB

    Zuhaitz Beloki

  • AS

    Aitor Soroa

  • XA

    Xabier Artola

  • AF

    Antske Fokkens

  • PH

    Paul Huygen

  • KV

    Kees Verstoep

Links