Analyzing Pre-processing Settings for Urdu Single-document Extractive Summarization

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

Preprocessing is a preliminary step in many fields including IR and NLP. The effect of basic preprocessing settings on English for text summarization is well-studied. However, there is no such effort found for the Urdu language (with the best of our knowledge). In this study, we analyze the effect of basic preprocessing settings for single-document text summarization for Urdu, on a benchmark corpus using various experiments. The analysis is performed using the state-of-the-art algorithms for extractive summarization and the effect of stopword removal, lemmatization, and stemming is analyzed. Results showed that these pre-processing settings improve the results.

Resources

Details

Paper ID

lrec2016-main-585

Pages

pp. 3686-3693

DOI

10.63317/36wkbr6yqzcd

BibKey

humayoun-yu-2016-analyzing

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

MH
Muhammad Humayoun
HY
Hwanjo Yu

Links

URL

DOI