Back to Main Conference 2016
LREC 2016main

SCALE: A Scalable Language Engineering Toolkit

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4vjahevtd9da

Abstract

In this paper we present SCALE, a new Python toolkit that contains two extensions to n-gram language models. The first extension is a novel technique to model compound words called Semantic Head Mapping (SHM). The second extension, Bag-of-Words Language Modeling (BagLM), bundles popular models such as Latent Semantic Analysis and Continuous Skip-grams. Both extensions scale to large data and allow the integration into first-pass ASR decoding. The toolkit is open source, includes working examples and can be found on http://github.com/jorispelemans/scale.

Details

Paper ID
lrec2016-main-612
Pages
pp. 3868-3871
BibKey
pelemans-etal-2016-scale
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • JP

    Joris Pelemans

  • LV

    Lyan Verwimp

  • KD

    Kris Demuynck

  • HV

    Hugo Van hamme

  • PW

    Patrick Wambacq

Links