Back to Main Conference 2008
LREC 2008main

Parallel Creation of Gigaword Corpora for Medium Density Languages - an Interim Report

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/2d8rhg2q2rjb

Abstract

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN* toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated.

Details

Paper ID
lrec2008-main-587
Pages
N/A
BibKey
halacsy-etal-2008-parallel
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • PH

    Péter Halácsy

  • AK

    András Kornai

  • PN

    Péter Németh

  • DV

    Dániel Varga

Links