HomeLREC 2022WorkshopsSIGULlrec2022-ws-sigul-26
Back to SIGUL 2022
LREC 2022workshop

SimRelUz: Similarity and Relatedness Scores as a Semantic Evaluation Dataset for Uzbek Language

Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

DOI:10.63317/2o9s3wek2mj5

Abstract

Semantic relatedness between words is one of the core concepts in natural language processing, thus making semantic evaluation an important task. In this paper, we present a semantic model evaluation dataset: SimRelUz - a collection of similarity and relatedness scores of word pairs for the low-resource Uzbek language. The dataset consists of more than a thousand pairs of words carefully selected based on their morphological features, occurrence frequency, semantic relation, as well as annotated by eleven native Uzbek speakers from different age groups and gender. We also paid attention to the problem of dealing with rare words and out-of-vocabulary words to thoroughly evaluate the robustness of semantic models.

Details

Paper ID
lrec2022-ws-sigul-26
Pages
pp. 199-206
BibKey
salaev-etal-2022-simreluz
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • US

    Ulugbek Salaev

  • EK

    Elmurod Kuriyozov

  • CG

    Carlos Gómez-Rodríguez

Links