A Vocabulary Analysis of News Articles in Relation to the Political Orientation of Their Source and Their Thematic
Proceedings of the Second Workshop on Building Educational Applications Using NLP
Abstract
Understanding how political orientation influences lexical choices is essential for detecting bias and framing in news media. In this paper, we present a computational framework for identifying nouns whose interpretation varies across politically divergent newspapers. Using a large corpus of French news articles published in 2024, we categorize texts by topics and political orientation. We use contextual embeddings to cluster occurrences of nouns to detect semantic variations and dissimilarity among sources. This allows us to map semantic distances between newspapers and identify polarized or editorially marked lexical choices. Our results show that topics, polysemy, and editorial priorities contribute differently to lexical divergence. We discuss these findings and highlight how contextual embeddings can help reveal semantic biases that would remain invisible through frequency-based methods. We conclude by outlining perspectives for improving topic classification and the clustering method, exploring alternative divergence measures, conducting a qualitative analysis of our results, and extending the framework to other languages or genres.