What Stories Do Language Models Tell About Nature? A Multi Layer Evaluation Framework for Ecological Alignment

Proceedings of the 2nd Workshop on Ecology, Environment, and Natural Language Processing

Abstract

Large language models increasingly generate environmental discourse, yet there is no standardised framework for evaluating the ecological narratives they produce. We introduce a structured prompt corpus and a reproducible multi layer evaluation framework grounded in ecolinguistic theory, operationalising five dimensions of ecological alignment: anthropocentrism, agency attribution, erasure of non human impacts, evaluation of growth, and responsibility framing. The framework integrates human judgement, an ecosophy aligned model judge, and automated semantic metrics, and is applied to outputs from ChatGPT, DeepSeek, and Ecophora, our ecosophy guided model. Ecophora achieves the highest alignment across all layers, with near ceiling judge scores of 159/160 and 142/160, together with the strongest automated composite performance. Divergences between automated metrics and holistic judgement indicate that ecological vocabulary alone does not guarantee ecological reasoning. The proposed framework provides a scalable methodology for benchmarking ecological alignment and assessing narrative shifts in language models.