HomeLREC 2022WorkshopsMWElrec2022-ws-mwe-10
Back to MWE 2022
LREC 2022workshop

Annotating “Particles” in Multiword Expressions in te reo Māori for a Part-of-Speech Tagger

Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

DOI:10.63317/44txgw7m6e5c

Abstract

This paper discusses the development of a Part-of-Speech tagger for te reo Māori, which is the Indigenous language of Aotearoa, also known as New Zealand. Te reo Māori is a particularly analytical and polysemic language. A word class called “particles” is introduced, they are small multi-functional words with many meanings, for example ē, ai, noa, rawa, mai, anō and koa. These “particles” are reflective of the analytical and polysemous nature of te reo Māori. They frequently occur both singularly and also in multiword expressions, including time adverbial phrases. The paper illustrates the challenges that they presented to part-of-speech tagging. It also discusses how we overcome these challenges in a way that is appropriate for te reo Māori, given its status an Indigenous language and history of colonisation. This includes a discussion of the importance of accurately reflecting the conceptualization of te reo Māori. And how this involved making no linguistic presumptions, and of eliciting faithful judgements from speakers, in a way that is uninfluenced by linguistic terminology.

Details

Paper ID
lrec2022-ws-mwe-10
Pages
pp. 67-74
BibKey
finn-etal-2022-annotating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • AF

    Aoife Finn

  • SD

    Suzanne Duncan

  • PJ

    Peter-Lucas Jones

  • GL

    Gianna Leoni

  • KM

    Keoni Mahelona

Links