Back to Main Conference 2012
LREC 2012main
A corpus of general and specific sentences from news
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)
Abstract
We present a corpus of sentences from news articles that are annotated as general or specific. We employed annotators on Amazon Mechanical Turk to mark sentences from three kinds of news articles―reports on events, finance news and science journalism. We introduce the resulting corpus, with focus on annotator agreement, proportion of general/specific sentences in the articles and results for automatic classification of the two sentence types.