Insights from Transfer Learning Experiments with Word-in-Context and Word Sense Disambiguation Models
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Abstract
We investigate the relationship between Word-in-Context (WiC) and Word Sense Disambiguation (WSD) by examining how training on one or both tasks affects performance on the other. Using established English datasets we train a unified sentence transformer (xlm-roberta-large) with target-word highlighting and contrastive loss. Models are evaluated on WiC and WSD benchmarks across single-task, joint, and combined dataset configurations. Results show that joint training consistently improves or maintains WiC performance, particularly in low-resource settings, while WSD benefits mainly when annotated data is limited. Cross-task experiments demonstrate strong transfer: WSD-trained models generalize effectively to WiC, and WiC-trained models outperform baselines on WSD, indicating shared context-sensitive lexical representations. Combining multiple WiC datasets further enhances accuracy and stability. These findings highlight the complementary nature of WiC and WSD and demonstrate that unified training strategies can yield more robust and generalizable sense disambiguation models. The results provide practical guidance for designing datasets and models in multilingual and low-resource contexts, emphasizing the value of leveraging shared semantic representations.