Request Correction

Use this form to request corrections to the paper metadata. Select the fields that need correction and provide the correct information.

Correction Guidelines

Click the edit button next to a field to report a correction.
Fill in the suggested correction value for each field you want to correct.
Provide your name and email so we can contact you if needed.

View all submitted correction requests

Paper Information

lrec2026-ws-osact-07

AlignAR: Generative Sentence Alignment for Arabic–English Parallel Corpora of Legal and Literary Texts

View lrec2026-ws-osact-07.pdf

Paper Fields

Click the edit button next to a field to report a correction.

Title

AlignAR: Generative Sentence Alignment for Arabic–English Parallel Corpora of Legal and Literary Texts

Abstract

High-quality parallel corpora serve as the fundamental backbone for advancements in Machine Translation (MT) research and the development of effective translation pedagogy. Despite this need, robust resources for the Arabic-English language pair remain significantly scarce. Furthermore, existing datasets are often limited by their reliance on simplistic one-to-one sentence mappings, which fail to capture the structural complexities inherent in natural language translation. To address this deficiency, this paper presents AlignAR, a novel generative sentence alignment method, alongside a comprehensive new Arabic–English dataset that juxtaposes simple legal documents with complex literary texts. Our evaluation demonstrates that "Easy" datasets lack the discriminatory power to fully assess alignment methods. By reducing one-to-one mappings within our "Hard" subset, we exposed the limitations of traditional alignment techniques when faced with structural divergence. In contrast, Large Language Model (LLM) based approaches demonstrated superior robustness and adaptability. Specifically, the proposed LLM-based approaches demonstrated better robustness, achieving an overall F1-score of 85.5%, a nearly 9% improvement over previous methods. This study underscores the importance of complex benchmarks and validates the efficacy of generative models in handling the intricacies of bitext alignment. The codes and datasets are available on Github.

Authors

Expand an author to correct their information. Use the remove button to request author removal, or add a new author.

PDF Attachment

You may attach a PDF as a corrected version of the paper. Max file size: 10MB. Only PDF files are accepted.

Drag & drop a PDF here, or click to select

Your Information

Name

Comment

Author Declaration *

I declare that I have notified all co-authors of the proposed corrections and obtained their consent, and that all modifications adhere to research ethics standards and the LREC correction policy.

Select at least one field to correct using the edit buttons above.