Back to Home

Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

  1. Click the edit button next to a field to report a correction.
  2. Fill in the suggested correction value for each field you want to correct.
  3. Provide your name and email so we can contact you if needed.

Paper Information

lrec2026-ws-rail-05

Benchmarking Text Embedding Models for South African Languages

Paper Fields

Click the edit button next to a field to report a correction.

Title

Benchmarking Text Embedding Models for South African Languages

Abstract

In this work we introduce a collection of monolingual embedding models for ten South African languages in four different architectures. To determine the quality of the embedding models we evaluate the embeddings on two sequence-labelling tasks, namely Part-of-Speech (POS) tagging and Named Entity Recognition (NER). Languages are grouped into conjunctive (isiNdebele, isiXhosa, isiZulu, and Siswati), disjunctive (Sepedi, Sesotho, Setswana, Tshivenḓa, and Xitsonga), and Afrikaans to establish the influence of training data set size and typology on the quality of the different embeddings. To isolate representation effects we train BiLSTM-CRF taggers, while keeping the architecture, data splits, and training budget fixed, varying only the input imbedding representations, namely GloVe, fastText, Flair, and RoBERTa. In our experiments, GloVe lags behind fastText, Flair, and the transformer-based models, confirming that static word-level vectors are less suited to morphologically complex, low-resource languages. Subword-aware embeddings such as fastText remain a reliable and computationally efficient baseline, while Flair is the most competitive overall across both POS tagging and NER tasks.


Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.


PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Author Declaration *

Select at least one field to correct using the edit buttons above.