HomeLREC 2020WorkshopsCLSSTSlrec2020-ws-clssts-06
Back to CLSSTS 2020
LREC 2020workshop

A Comparison of Unsupervised Methods for Ad hoc Cross-Lingual Document Retrieval

Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

DOI:10.63317/3h5z74soo3be

Abstract

We address the problem of linking related documents across languages in a multilingual collection. We evaluate three diverse unsupervised methods to represent and compare documents: (1) multilingual topic model; (2) cross-lingual document embeddings; and (3) Wasserstein distance. We test the performance of these methods in retrieving news articles in Swedish that are known to be related to a given Finnish article. The results show that ensembles of the methods outperform the stand-alone methods, suggesting that they capture complementary characteristics of the documents

Details

Paper ID
lrec2020-ws-clssts-06
Pages
pp. 32-37
BibKey
zosa-etal-2020-comparison
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • EZ

    Elaine Zosa

  • MG

    Mark Granroth-Wilding

  • LP

    Lidia Pivovarova

Links