On the Usage of Kappa to Evaluate Agreement on Coding Tasks

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

Abstract

In recent years, the Kappa coefficient of agreement has become the de facto standard to evaluate intercoder agreement in the discourse and dialogue processing community. Together with the adoption of this standard, researchers have adopted one specific scale to evaluate Kappa values, the one proposed in (Krippendorff, 1980). In this paper, I highlight some issues that should be taken into account when evaluating Kappa values. Finally, I speculate on whether Kappa could be used as a measure to evaluate a system’s performance.