UPPC - Urdu Paraphrase Plagiarism Corpus

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.

Resources

Details

Paper ID

lrec2016-main-289

Pages

pp. 1832-1836

DOI

10.63317/2ohjjih6m5jb

BibKey

sharjeel-etal-2016-uppc

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

MS
Muhammad Sharjeel
PR
Paul Rayson
RN
Rao Muhammad Adeel Nawab

Links

URL

DOI