Back to Main Conference 2004
LREC 2004main

Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/47ks6kaj3eij

Abstract

One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero article for noun phrases, based on a set of features extracted from the local context of each. When the classifier was trained on 6 million noun phrases, its performance was correct about 88% of the time. We also used the classifier to detect article errors in the TOEFL essays of native speakers of Chinese, Japanese, and Russian. Agreement with human annotators was about 88% (kappa = 0.36). Many of the disagreements were due to the classifier's lack of discourse information. Performance rose to 94% agreement (kappa = 0.47) when the system accepted noun phrases as correct in cases where its own confidence was low.

Details

Paper ID
lrec2004-main-447
Pages
N/A
BibKey
han-etal-2004-detecting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • NH

    Na-Rae Han

  • MC

    Martin Chodorow

  • CL

    Claudia Leacock

Links