HomeLREC 2026WorkshopsLT4HALAlrec2026-ws-lt4hala-06
Back to LT4HALA 2026
LREC 2026workshop

A New State-of-the-Art BERT Model for Judeo-Arabic

Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026

DOI:10.63317/52qpsk7c5bfc

Abstract

We present JABERT, the first BERT model pretrained specifically for historical Judeo-Arabic texts. We demonstrate that JABERT outperforms Arabic and multilingual models on the downstream task of Judeo-Arabic homograph disambiguation. Furthermore, in order to test the latter, we have curated and annotated the first Judeo-Arabic homograph test set. We release both JABERT and the Judeo-Arabic homograph test to the public for unrestricted use.

Details

Paper ID
lrec2026-ws-lt4hala-06
Pages
pp. 58-71
BibKey
rosensweig-etal-2026-new
Editors
Rachele Sprugnoli, Marco Passarotti
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Fourth Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • ER

    Elisha Rosensweig

  • YL

    Yitzchak Lindenbaum

  • HG

    Hillel Gershuni

  • VR

    Vered Raziel-Kretzmer

  • DC

    Daniel Caine

  • AS

    Avi Shmidman

Links