HomeLREC 2022WorkshopsSIGULlrec2022-ws-sigul-18
Back to SIGUL 2022
LREC 2022workshop

Question Answering Classification for Amharic Social Media Community Based Questions

Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

DOI:10.63317/3mwc2swiw752

Abstract

In this work, we build a Question Answering (QA) classification dataset from a social media platform, namely the Telegram public channel called @AskAnythingEthiopia. The channel has more than 78k subscribers and has existed since May 31, 2019. The platform allows asking questions that belong to various domains, like politics, economics, health, education, and so on. Since the questions are posed in a mixed-code, we apply different strategies to pre-process the dataset. Questions are posted in Amharic, English, or Amharic but in a Latin script. As part of the pre-processing tools, we build a Latin to Ethiopic Script transliteration tool. We collect 8k Amharic and 24K transliterated questions and develop deep learning-based questions answering classifiers that attain as high as an F-score of 57.29 in 20 different question classes or categories. The datasets and pre-processing scripts are open-sourced to facilitate further research on the Amharic community-based question answering.

Details

Paper ID
lrec2022-ws-sigul-18
Pages
pp. 137-145
BibKey
destaw-etal-2022-question
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • TB

    Tadesse Destaw Belay

  • SY

    Seid Muhie Yimam

  • AA

    Abinew Ayele

  • CB

    Chris Biemann

Links