Back to Main Conference 2024
LREC-COLING 2024main

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

DOI:10.63317/544yw6oerb22

Abstract

Mapping words into a fixed-dimensional vector space is the backbone of modern NLP. While most word embedding methods successfully encode semantic information, they overlook phonetic information that is crucial for many tasks. We develop three methods that use articulatory features to build phonetically informed word embeddings. To address the inconsistent evaluation of existing phonetic word embedding methods, we also contribute a task suite to fairly evaluate past, current, and future methods. We evaluate both (1) intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and (2) extrinsic performance on tasks such as rhyme and cognate detection and sound analogies. We hope our task suite will promote reproducibility and inspire future phonetic embedding research.

Details

Paper ID
lrec2024-main-1168
Pages
pp. 13344-13355
BibKey
zouhar-etal-2024-pwesuite
Editor
N/A
Publisher
European Language Resources Association (ELRA) and ICCL
ISSN
2522-2686
ISBN
979-10-95546-34-4
Conference
Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Location
Turin, Italy
Date
20 May 2024 25 May 2024

Authors

  • VZ

    Vilém Zouhar

  • KC

    Kalvin Chang

  • CC

    Chenxuan Cui

  • NC

    Nate B. Carlson

  • NR

    Nathaniel Romney Robinson

  • MS

    Mrinmaya Sachan

  • DM

    David R. Mortensen

Links