TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Abstract

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem. This has immense importance in widespread Natural Language Processing (NLP) applications ranging from extractive text document summarization to tracking development of news events to predicting impact of scholarly articles. Although a very relevant problem in the present context of exponential data duplication, we are unaware of any document level dataset that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this relative gap, here in this work, we present a resource for benchmarking the techniques for document level novelty detection. We create the resource via topic-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.

Resources

Details

Paper ID

lrec2018-main-559

Pages

N/A

DOI

10.63317/3hwywkyeb4hz

BibKey

ghosal-etal-2018-tap

Editors

Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

79-10-95546-00-9

Conference

Eleventh International Conference on Language Resources and Evaluation

Location

Miyazaki, Japan

Date

7 - 12 May 2018

Authors

TG
Tirthankar Ghosal
AS
Amitra Salam
ST
Swati Tiwari
AE
Asif Ekbal
PB
Pushpak Bhattacharyya

Links

URL

DOI