Back to Main Conference 2016
LREC 2016main

Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/36wkbr6yqzcd

Abstract

Preprocessing is a preliminary step in many fields including IR and NLP. The effect of basic preprocessing settings on English for text summarization is well-studied. However, there is no such effort found for the Urdu language (with the best of our knowledge). In this study, we analyze the effect of basic preprocessing settings for single-document text summarization for Urdu, on a benchmark corpus using various experiments. The analysis is performed using the state-of-the-art algorithms for extractive summarization and the effect of stopword removal, lemmatization, and stemming is analyzed. Results showed that these pre-processing settings improve the results.

Details

Paper ID
lrec2016-main-585
Pages
pp. 3686-3693
BibKey
humayoun-yu-2016-analyzing
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • MH

    Muhammad Humayoun

  • HY

    Hwanjo Yu

Links