Back to Main Conference 2014
LREC 2014main

AraNLP: a Java-based Library for the Processing of Arabic Text.

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/2r58j2tkkx7s

Abstract

We present a free, Java-based library named “AraNLP” that covers various Arabic text preprocessing tools. Although a good number of tools for processing Arabic text already exist, integration and compatibility problems continually occur. AraNLP is an attempt to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily by integrating or accurately adapting existing tools and by developing new ones when required. The library includes a sentence detector, tokenizer, light stemmer, root stemmer, part-of speech tagger (POS-tagger), word segmenter, normalizer, and a punctuation and diacritic remover.

Details

Paper ID
lrec2014-main-498
Pages
pp. 4134-4138
BibKey
althobaiti-etal-2014-aranlp
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • MA

    Maha Althobaiti

  • UK

    Udo Kruschwitz

  • MP

    Massimo Poesio

Links