Polysemy and Ambiguity: The Case of the French Modal Verb Devoir

Proceedings of the 22nd Joint ACL - ISO Workshop on Interoperable Semantic Annotation and Representation (ISA-22) @ LREC 2026

DOI:10.63317/4jddt6ku3yck

Abstract

This article focus on a methodology for representing the semantics of polysemous markers whose meanings cannot (or do not have to) be disambiguated, even in context. We name this task (multi-)sense representation and present here the French modal verb devoir as a case study. Specifically, we reframe this task — traditionally treated as a multi-class problem — as a multi-label classification problem to account for instances that remain ambiguous due to contextual and intentional factors. In order to fine-tune our model (CamemBERT), we implement an active learning loop to enhance the annotation process and we demonstrate that combining global and local features yields the best results (F1-micro = 0.83; F1-macro = 0.79). The model is then applied on two distinct corpora, showing that the automatic analysis of devoir’s modal senses provides deeper insights into modal verb usage and facilitates comparisons across corpora differing in medium (spoken vs. written) or genre (e.g. legal discourse). Furthermore, our multi-label approach enables the detection and analysis of double-labeled instances, offering valuable applications, as for example legal discourse interpretation and second language acquisition.