GATE-Reranker: A Strong Arabic Cross-Encoder for Document Reranking

The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks

Abstract

Arabic information retrieval increasingly relies on multi-stage pipelines in which a fast first-stage retriever produces candidate passages and a neural reranker refines relevance. While transformer cross-encoders deliver strong effectiveness through joint query–passage encoding, multilingual rerankers achieve competitive performance on Arabic benchmarks. However, systematic analysis of calibration, robustness, and deployment behavior in Arabic-specific settings remains limited. We present GATE-Reranker, a compact Arabic cross-encoder initialized from an Arabic semantic embedding backbone and fine-tuned on large-scale mMARCO-style Arabic triplets. The model scores each query–passage pair via full self-attention and a lightweight regression head, enabling plug-and-play second-stage reranking for Arabic search and RAG systems. We evaluate on three Arabic benchmarks covering binary relevance discrimination, controlled multi-negative reranking, and large-scale mMARCO evaluation. While remaining competitive with strong multilingual rerankers in ranking effectiveness, GATE-Reranker demonstrates significantly improved calibration and discriminative behavior. These properties translate into more reliable downstream performance in retrieval and RAG pipelines, while maintaining low GPU memory and latency on a Tesla T4.