Back to Main Conference 2002
LREC 2002main

Building and annotating a corpus for the study of journalistic text reuse

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/3dqyw7g68fnr

Abstract

In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).

Details

Paper ID
lrec2002-main-218
Pages
N/A
BibKey
clough-etal-2002-building
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • PC

    Paul Clough

  • RG

    Robert Gaizauskas

  • SP

    S. L. Piao

Links