HomeLREC 2020WorkshopsCLSSTSlrec2020-ws-clssts-08
Back to CLSSTS 2020
LREC 2020workshop

The 2019 BBN Cross-lingual Information Retrieval System

Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

DOI:10.63317/4n3o6npq6yd5

Abstract

In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR output, together with multiple translation renderings, are selected and translated into English snippets via a summarization model. Our turnkey system is language agnostic and can be quickly trained for a new low-resource language in few days.

Details

Paper ID
lrec2020-ws-clssts-08
Pages
pp. 44-51
BibKey
zhang-etal-2020-2019
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • LZ

    Le Zhang

  • DK

    Damianos Karakos

  • WH

    William Hartmann

  • MS

    Manaj Srivastava

  • LT

    Lee Tarlin

  • DA

    David Akodes

  • SG

    Sanjay Krishna Gouda

  • NB

    Numra Bathool

  • LZ

    Lingjun Zhao

  • ZJ

    Zhuolin Jiang

  • RS

    Richard Schwartz

  • JM

    John Makhoul

Links