Back to Main Conference 2008
LREC 2008main

Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/47wky8sbnwpa

Abstract

We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary application scenario for which we plan to build this resource is the automatic identification of web genres. Web genres are rather difficult to capture and to describe in their entirety, but we plan for the finished reference corpus to contain multi-level tags of the respective genre or genres a web document or a website instantiates. As the construction of such a corpus is by no means a trivial task, we discuss several alternatives that are, for the time being, mostly based on existing collections. Furthermore, we discuss a shared set of genre categories and a multi-purpose tool as two additional prerequisites for a reference corpus of web genres.

Details

Paper ID
lrec2008-main-048
Pages
N/A
BibKey
rehm-etal-2008-towards
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • GR

    Georg Rehm

  • MS

    Marina Santini

  • AM

    Alexander Mehler

  • PB

    Pavel Braslavski

  • RG

    Rüdiger Gleim

  • AS

    Andrea Stubbe

  • SS

    Svetlana Symonenko

  • MT

    Mirko Tavosanis

  • VV

    Vedrana Vidulin

Links