Back to Main Conference 2024
LREC-COLING 2024main

TED-EL: A Corpus for Speech Entity Linking

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/4yjsa2ynccmu

Abstract

Speech entity linking amis to recognize mentions from speech and link them to entities in knowledge bases. Previous work on entity linking mainly focuses on visual context and text context. In contrast, speech entity linking focuses on audio context. In this paper, we first propose the speech entity linking task. To facilitate the study of this task, we propose the first speech entity linking dataset, TED-EL. Our corpus is a high-quality, human-annotated, audio, text, and mention-entity pair parallel dataset derived from Technology, Entertainment, Design (TED) talks and includes a wide range of entity types (24 types). Based on TED-EL, we designed two types of models: ranking-based and generative speech entity linking models. We conducted experiments on the TED-EL dataset for both types of models. The results show that the ranking-based models outperform the generative models, achieving an F1 score of 60.68%.

Details

Paper ID
lrec2024-main-1365
Pages
pp. 15721-15731
BibKey
li-etal-2024-ted
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • SL

    Silin Li

  • RS

    Ruoyu Song

  • TL

    Tianwei Lan

  • ZL

    Zeming Liu

  • YG

    Yuhang Guo

Links