Back to Main Conference 2026
LREC 2026main

A Test Collection for Part-of-Speech Tagging and Word Sense Disambiguation

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/5mukdiuk65f4

Abstract

We evaluate a focused test collection at the intersection of part-of-speech tagging and word‑sense disambiguation. The collection targets words such as train, novel, and lean, where part-of-speech contrasts align with clear meaning differences. We use it to detect regressions across tagger versions, track quantitative and qualitative progress over time, and test robustness to orthographic variation. Experiments with the Stanford and TnT taggers show 68% accuracy, compared with 92% for a recent spaCy transformer model. Earlier taggers erred mainly on noun–verb distinctions; spaCy’s errors more often involve noun–adjective distinctions. Uppercase text roughly doubles error rates for all taggers. We discuss common problems and propose directions for future testing.

Details

Paper ID
lrec2026-main-925
Pages
pp. 11813-11821
BibKey
krovetz-2026-test
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • RK

    Robert Krovetz

Links