Modeling Topics as Linguistic Linked Open Data: A First Attempt Using BERTopic, Ontolex-Lemon and FrAC
Proceedings of 10th Workshop on Linked Data in Linguistics (LDL-2026)
Abstract
Parliamentary discourse constitutes a key domain in which political actors publicly articulate policy positions and priorities through language. This study investigates debates from the Italian Chamber of Deputies (1948–2006) to identify and analyse latent semantic themes and their evolution using BERTopic-based dynamic topic modeling. The analysis relies on a subset of the ItaParlCorpus (Cova, 2025), a large-scale, machine-readable corpus enriched with temporal, institutional, and political metadata. Beyond topic extraction,this work addresses a largely unexplored challenge: the formalization of topics derived from unsupervised, embedding-based topic modeling as Linked Data entities, adopting a linguistic perspective. Extracted topics are formalized as semantic entities reusing the OntoLex–Lemon model, its FrAC extension and declaring a dedicated ontology to link topics to speeches, speakers, political parties, and temporal information reusing standardized vocabularies and persistent URIs. This integration enables semantic querying through SPARQL, supporting analyses of topic distributions across political actors, parties and illustrating the analytical potential of the proposed approach. Moreover, the study highlights limitations in the formalization of topic modeling outputs, particularly regarding the representation of ambiguous word forms and their alignment with lexical concepts in OntoLex–Lemon.