Request Correction
Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.
Correction Guidelines
- Click the edit button next to a field to report a correction.
- Fill in the suggested correction value for each field you want to correct.
- Provide your name and email so we can contact you if needed.
Paper Information
NAJD-MT: High-Fidelity Saudi Najdi–English Training Data for Bidirectional Neural Machine Translation
Paper Fields
Click the edit button next to a field to report a correction.
NAJD-MT: High-Fidelity Saudi Najdi–English Training Data for Bidirectional Neural Machine Translation
Dialectal Arabic remains significantly underrepresented in parallel resources for direct machine translation with English, particularly for regional varieties such as Saudi Najdi Arabic. In this work, we introduce NAJD-MT, a systematically constructed Saudi Najdi-English parallel corpus designed for training bidirectional neural machine translation models. Starting from the Saudi Arabic Dialectal Annotated (SADA) dataset, we generate English translations using GPT-4.1 and subsequently apply cross-lingual embedding-based cosine similarity filtering to improve semantic alignment and reduce translation noise. We analyze the impact of varying semantic similarity thresholds on corpus size and downstream translation performance. Using the constructed datasets, we train and evaluate multiple Transformer-based models, including NLLB-200, OPUS-MT, mBART, and AraT5v2, in both Najdi→English and English→Najdi directions. Experimental results demonstrate that stricter semantic filtering (cosine ≥ 0.7) consistently improves translation quality despite reducing dataset size, highlighting that data purity plays a critical role in dialectal machine translation training. Our findings provide a reproducible framework for constructing high-fidelity dialect English parallel corpora and emphasize the importance of semantic alignment filtering in low-resource dialectal settings.
Authors
Expand an author to correct their information. Use the remove button to request author removal, or add a new author.
PDF Attachment
You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.
Your Information
Author Declaration *
Select at least one field to correct using the edit buttons above.