HomeLREC 2020WorkshopsSLTUlrec2020-ws-sltu-21
Back to SLTU 2020
LREC 2020workshop

Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

DOI:10.63317/4pz4s2vurgjk

Abstract

This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition and with addition of Sentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about major events that can cause an impact in Europe and the rest of the world. Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones. In order to classify all European Union official languages in terms of resources, we have analysed the size of annotated corpora as well as the existence of pre-trained models in mainstream Language Processing tools, and we have combined this information with the proposed classification published at META-NET whitepaper series.

Details

Paper ID
lrec2020-ws-sltu-21
Pages
pp. 153-158
BibKey
alves-etal-2020-natural
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Location
undefined, undefined
Date
11 May 2020 16 May 2020

Authors

  • DA

    Diego Alves

  • GT

    Gaurish Thakkar

  • MT

    Marko Tadić

Links