Back to Main Conference 2018
LREC 2018main

Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/3rob6twf9o8r

Abstract

This paper presents a work in progress to create a multilayered syntactically and semantically annotated text corpus for Latvian. The broad application area we address is natural language understanding (NLU), while more specific applications are abstractive text summarization and knowledge base population, which are required by the project industrial partner, Latvian information agency LETA, for the automation of various media monitoring processes. Both the multilayered corpus and the downstream applications are anchored in cross-lingual state-of-the-art representations: Universal Dependencies (UD), FrameNet, PropBank and Abstract Meaning Representation (AMR). In this paper, we particularly focus on the consecutive annotation of the treebank and framebank layers. We also draw links to the ultimate AMR layer and the auxiliary named entity and coreference annotation layers. Since we are aiming at a medium-sized still general-purpose corpus for a less-resourced language, an important aspect we consider is the variety and balance of the corpus in terms of genres, authors and lexical units.

Details

Paper ID
lrec2018-main-714
Pages
N/A
BibKey
gruzitis-etal-2018-creation
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • NG

    Normunds Gruzitis

  • LP

    Lauma Pretkalnina

  • BS

    Baiba Saulite

  • LR

    Laura Rituma

  • GN

    Gunta Nespore-Berzkalne

  • AZ

    Arturs Znotins

  • PP

    Peteris Paikens

Links