Back to Main Conference 2008
LREC 2008main

An eRulemaking Corpus: Identifying Substantive Issues in Public Comments

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3okpkh2ezgnj

Abstract

We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators. We also briefly describe the results of experiments that apply standard and hierarchical text categorization techniques to the eRulemaking data sets. The corpus is the first in a series of related sentence-level text categorization corpora to be developed in the eRulemaking domain.

Details

Paper ID
lrec2008-main-107
Pages
N/A
BibKey
cardie-etal-2008-erulemaking
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • CC

    Claire Cardie

  • CF

    Cynthia Farina

  • MR

    Matt Rawding

  • AA

    Adil Aijaz

Links