Back to Main Conference 2026
LREC 2026main

MaskedVerbalizer: Automatic Verbalizer Construction for Few-Shot Text Classification in Low-Resource Right-to-Left Languages

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/2ijipj4r77bn

Abstract

Text classification in low-resource right-to-left languages faces significant challenges due to the scarcity of annotated data and the morphological richness of languages such as Arabic, Urdu, Sindhi, and Pashto. Arabic and Urdu alone are spoken by over 380+ million and 246+ million people worldwide, respectively. Pashto is the national language of Afghanistan, highlighting the importance of effective language technologies. While multilingual Pre-trained Language Models (PLMs) have shown promising results, they typically require extensive labeled datasets and computationally expensive fine-tuning to achieve better performance. Such limitations make these PLMs impractical for the low-resource settings described above. Therefore, we employ a few-shot strategy (zero, 4, or 8 shots) to achieve results comparable to those of standard fine-tuning. In this work, we propose MaskedVerbalizer, a novel technique designed for few-shot text classification. Our method introduces an automatic verbalizer construction approach that generates class-specific label words in 4-shot settings, eliminating the need for extensive manual intervention. Despite maintaining a simple model architecture, MaskedVerbalizer achieves effective performance in classification benchmarks. Experimental results demonstrate that our method effectively addresses the core challenges of low-resource text classification, providing a practical, computationally efficient solution. We achieved accuracies of 90.43% and 92.72% with mBERT and XLM-RoBERTa, respectively, representing improvements of 25–30% over soft and automatic verbalizers. The code for MaskedVerbalizer is publicly available at https://github.com/Furqann-hue/MV.

Details

Paper ID
lrec2026-main-059
Pages
pp. 795-804
BibKey
ullah-etal-2026-maskedverbalizer
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • FU

    Faizad Ullah

  • FS

    Furqan Sikandar

  • AW

    Areeba Waqar

  • FA

    Faizan Ali

  • MA

    Muhammad Sohaib Ayub

  • MM

    Mubashar Mushtaq

  • AK

    Asim Karim

Links