HomeLREC 2026WorkshopsLANLPlrec2026-ws-lanlp-08
Back to LANLP 2026
LREC 2026workshop

Exploration of Sentence Representations in Spanish BERT-like Models

Proceedings of LANLP: Bridging Ibero and Latin American NLP Communities

DOI:10.63317/5moein87oxiw

Abstract

Transformer-based language models, ubiquitous in NLP nowadays, generate internal representations (embeddings) of words and sentences. Yet, systematic comparisons of embedding strategies from various models remain limited. In this work, we evaluate Spanish embeddings from several BERT-like models (BETO, multilingual BERT, XLM-RoBERTa, ROUBERTa) to understand their syntactic and semantic capabilities across layers. We propose novel sentence-level analogy tests to probe generalization. Results show tasks like verb negation or word reordering perform best with embeddings from earlier layers, while nuanced semantic distinctions—such as agent or patient gender—are better captured by deeper layers. Our findings provide guidelines for embedding strategies and offer a foundation for further NLP research.

Details

Paper ID
lrec2026-ws-lanlp-08
Pages
pp. 55-65
BibKey
herrera-etal-2026-exploration
Editors
German Rigau Claramunt, Pablo Gamallo, Rafael Muñoz Guillena, Luis Chiruzzo, Eugenio Martínez Cámara
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of LANLP: Bridging Ibero and Latin American NLP Communities
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • GH

    Gonzalo Herrera

  • AR

    Aiala Rosá

  • LC

    Luis Chiruzzo

Links