Back to Main Conference 2016
LREC 2016main

Fast and Robust POS tagger for Arabic Tweets Using Agreement-based Bootstrapping

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4ds35xk9ijjc

Abstract

Part-of-Speech(POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because they are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and borrowing foreign words. In this paper, we present an evaluation and a detailed error analysis of state-of-the-art POS taggers for Arabic when applied to Arabic tweets. On the basis of this analysis, we combine normalisation and external knowledge to handle the domain noisiness and exploit bootstrapping to construct extra training data in order to improve POS tagging for Arabic tweets. Our results show significant improvements over the performance of a number of well-known taggers for Arabic.

Details

Paper ID
lrec2016-main-238
Pages
pp. 1500-1506
BibKey
albogamy-ramsay-2016-fast
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • FA

    Fahad Albogamy

  • AR

    Allan Ramsay

Links