Exploration of Sentence Representations in Spanish BERT-like Models

Proceedings of LANLP: Bridging Ibero and Latin American NLP Communities

Abstract

Transformer-based language models, ubiquitous in NLP nowadays, generate internal representations (embeddings) of words and sentences. Yet, systematic comparisons of embedding strategies from various models remain limited. In this work, we evaluate Spanish embeddings from several BERT-like models (BETO, multilingual BERT, XLM-RoBERTa, ROUBERTa) to understand their syntactic and semantic capabilities across layers. We propose novel sentence-level analogy tests to probe generalization. Results show tasks like verb negation or word reordering perform best with embeddings from earlier layers, while nuanced semantic distinctions—such as agent or patient gender—are better captured by deeper layers. Our findings provide guidelines for embedding strategies and offer a foundation for further NLP research.