Back to Main Conference 2026
LREC 2026main

Masrad: Arabic Terminology Management Corpora with Semi-Automatic Construction

Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

DOI:10.63317/4azqkq8r4eez

Abstract

This paper presents Masrad (i.e. glossary in Arabic), a terminology dataset for Arabic terminology management, and a method with supporting tools for its semi-automatic construction. The entries in Masrad are (f,a) pairs of foreign (non-Arabic) terms f, appearing in specialized, academic and field-specific books next to their Arabic a counterparts. Masrad-Ex systematically extracts these pairs as a first step to construct Masrad. Masrad helps improving term consistency in academic translations and specialized Arabic documents, and automating cross-lingual text processing. Masrad-Ex leverages translated terms organically occurring in Arabic books, and considers several candidate pairs for each term phrase. The candidate Arabic terms occur next to the foreign terms, and vary in length. Masrad-Ex computes lexicographic, phonetic, morphological, and semantic similarity metrics for each candidate pair, and uses heuristic, machine learning, and machine learning with post-processing approaches to decide on the best candidate. This paper presents Masrad after thorough expert review and makes it available to the interested research community. The best performing Masrad-Ex approach achieved 90.5% precision and 92.4% recall.

Details

Paper ID
lrec2026-main-629
Pages
pp. 7918-7926
BibKey
nasser-etal-2026-masrad
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-493814-49-4
Conference
The Fifteenth Language Resources and Evaluation Conference (LREC 2026)
Location
Palma, Mallorca, Spain
Date
11 May 2026 16 May 2026

Authors

  • MN

    Mahdi Nasser

  • LS

    Laura Sayah

  • FZ

    Fadi Zaraket

Links