Evaluating Transformer Model Family Representations Through Automated Essay Scoring

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Abstract

Large Language Models have become central to Automated Essay Scoring (AES), typically through fine-tuned transformer encoders or prompt-based applications of decoder models. However, the representational capacity of decoder models as frozen embedding extractors remains largely unexplored. In this paper, we present a controlled comparison between encoder and decoder transformer embeddings for prompt-agnostic AES. Using regression models, we evaluate frozen representations across two English datasets. We analyzed scaling effects and the impact of integrating explicit linguistic features in hybrid configurations. Our results show that decoder embeddings consistently outperform encoder embeddings in embedding-only settings, with gains generalizing across holistic essay scoring and proficiency prediction. Scaling effects are modest, and hybrid models that combine contextual embeddings with linguistic features yield further improvements. Notably, frozen decoder embeddings achieve performance competitive with a fine-tuned BERT. These findings highlight the importance of representation-level properties in essay scoring.

Resources

Details

Paper ID

lrec2026-ws-readixtsar-11

Pages

pp. 142-150

DOI

10.63317/35ytj3qcrwv4

BibKey

ozten-etal-2026-evaluating

Editors

Matthew Shardlow, Thomas François, Raquel Amaro, Jorge Baptista, Rémi Cardon, Eugénio Ribeiro, Horacio Saggion, Regina Stodden, Amalia Todirascu, Rodrigo Wilkens

Publisher

European Language Resources Association (ELRA)

ISSN

N/A

ISBN

N/A

Workshop

Proceedings of the Joint Workshop on Readability and Text Simplification (READIxTSAR) @ LREC 2026

Location

Palma, Mallorca, Spain

Date

11 - 16 May 2026

Authors

AO
Akchay Ozten
RW
Rodrigo Wilkens

Links

URL

DOI