Back to Main Conference 2002
LREC 2002main

The American National Corpus: More Than the Web Can Provide

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/38kxz26sc2sv

Abstract

The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by the availability of web materials, the ANC is likely to provide a resource for developing web acquisition techniques to support tasks such as genre and language detection and automatic annotation. This paper presents a comparison of the ANC in terms of both content and format with a test corpus compiled from web data, and a discussion of points of intersection and divergence.

Details

Paper ID
lrec2002-main-303
Pages
N/A
BibKey
ide-etal-2002-american
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • NI

    Nancy Ide

  • RR

    Randi Reppen

  • KS

    Keith Suderman

Links