Unsupervised GRI-TCFD Alignment with LLM-Assisted Validation for Climate Disclosure and Greenwashing Risk Analysis

Proceedings of the 2nd Workshop on Ecology, Environment, and Natural Language Processing

Abstract

Climate-related corporate disclosures play a central role in sustainable finance and regulatory supervision, but remain difficult to analyze due to their length, unstructured format, and strategic language. While existing NLP approaches have been applied to ESG scoring and greenwashing detection, most operate at the document level and lack explicit alignment with formal reporting standards. We propose a scalable paragraph-level framework for aligning sustainability disclosures with the Global Reporting Initiative (GRI) indicators and the Task Force on Climate-related Financial Disclosures (TCFD) pillars. Our approach combines weak supervision, climate-focused GRI-TCFD mapping, embedding-based semantic similarity, and LLM validation for climate detection. In parallel, we introduce a paragraph-level greenwashing proxy based on commitment intensity, claim specificity, and sentiment polarity. This proxy complements regulatory alignment by capturing linguistic signals associated with potentially symbolic climate communication. The resulting augmented dataset is used to fine-tune ClimateBERT models in both single-task and multi-task settings. Experimental results show that weakly supervised dataset augmentation improves robustness and generalization compared to purely manual training, with further gains in the multi-task configuration. By integrating regulatory semantics, domain-adapted language models, and scalable annotation strategies, this study advances standard-aligned climate disclosure analysis and provides tools directly relevant to climate-related financial risk assessment.