Back to Main Conference 2010
LREC 2010main

Experimental Deployment of a Grid Virtual Organization for Human Language Technologies

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/4epxw6igiv4g

Abstract

We propose to create a grid virtual organization for human language technologies, at first chiefly with the task of enabling linguistic researches to use existing distributed computing facilities of the European grid infrastructure for more efficient processing of large data sets. After a brief overview of modern grid computing, a number of common use-cases of natural language processing tasks running on the grid are presented, notably corpus annotation with morpho-syntactic tagging (600+ million-word corpus annotated in less than a day), $n$-gram statistics processing of a corpus and creation of grid-backed web-accessible services with annotation and term-extraction as examples. Implementation considerations and common problems of using grid for this type of tasks are laid out. We conclude with an outline of a simple action plan for evolving the infrastructure created for these experiments into a fully functional Human Language Technology grid Virtual Organization with the goal of making the power of European grid infrastructure available to the linguistic community.

Details

Paper ID
lrec2010-main-617
Pages
N/A
BibKey
javorsek-erjavec-2010-experimental
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • JJ

    Jan Jona Javoršek

  • TE

    Tomaž Erjavec

Links