HomeLREC 2026WorkshopsNLP4ECOLOGYlrec2026-ws-nlp4ecology-07
Back to NLP4ECOLOGY 2026
LREC 2026workshop

Ecological Discourse Modeling in a Low-Resource Setting: A Longitudinal Vietnamese Climate Corpus with Comparative Topic Modeling

Proceedings of the 2nd Workshop on Ecology, Environment, and Natural Language Processing

DOI:10.63317/4n8q3zryqvju

Abstract

Climate change discourse has expanded substantially in recent decades, yet computational analyses remain concentrated on high-resource languages. In this paper, we construct a longitudinal Vietnamese climate news corpus and examine thematic structure and temporal evolution in a lower-resource setting. The corpus comprises 10,401 articles published between 2004 and 2026 and is systematically preprocessed using linguistically informed word segmentation. To ensure domestic relevance, we apply transformer-based Named Entity Recognition and construct a geographically grounded subset of 4,501 Vietnam-focused documents. We analyze this dataset using both Latent Dirichlet Allocation and BERTopic. Results reveal stable thematic dimensions alongside longitudinal shifts from event-driven pollution reporting toward governance- and energy-centered narratives. Embedding-based modeling achieves higher semantic coherence while maintaining comparable topic diversity. The main contribution of this work is thus the compilation of a structured Vietnamese climate corpus and a systematic analysis of discourse evolution in an underrepresented language context.

Details

Paper ID
lrec2026-ws-nlp4ecology-07
Pages
pp. 69-78
BibKey
nguyen-2026-ecological
Editors
Francesca Grasso, Valerio Basile, Cristina Bosco, Muhammad Okky Ibrohim, Maria Skeppstedt, Manfred Stede
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 2nd Workshop on Ecology, Environment, and Natural Language Processing
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • HN

    Huyen Phuong Nguyen

Links