Back to Main Conference 2010
LREC 2010main

Annotating the Enron Email Corpus with Number Senses

Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010)

DOI:10.63317/2cni8pxjv4w8

Abstract

The Enron Email Corpus provides ``Real World'' text in the business email domain, which is a target domain for many speech and language applications. We present a section of this corpus annotated with number senses - labelling each number as a date, time, year, telephone number etc. We show that sense categories and their frequencies are very different in this domain than in newswire text. The annotated corpus can provide valuable material for the development of number sense disambiguation techniques. We have released the annotations into the public domain, to allow other researchers to perform comparisons.

Details

Paper ID
lrec2010-main-444
Pages
N/A
BibKey
moore-etal-2010-annotating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-6-7
Conference
Seventh International Conference on Language Resources and Evaluation
Location
Valletta, Malta
Date
17 May 2010 23 May 2010

Authors

  • SM

    Stuart Moore

  • SB

    Sabine Buchholz

  • AK

    Anna Korhonen

Links