HomeLREC 2022WorkshopsCLTWlrec2022-ws-cltw-13
Back to CLTW 2022
LREC 2022workshop

Use of Transformer-Based Models for Word-Level Transliteration of the Book of the Dean of Lismore

Proceedings of the 4th Celtic Language Technology Workshop within LREC2022

DOI:10.63317/4bierqhjekzo

Abstract

The Book of the Dean of Lismore (BDL) is a 16th-century Scottish Gaelic manuscript written in a non-standard orthography. In this work, we outline the problem of transliterating the text of the BDL into a standardised orthography, and perform exploratory experiments using Transformer-based models for this task. In particular, we focus on the task of word-level transliteration, and achieve a character-level BLEU score of 54.15 with our best model, a BART architecture pre-trained on the text of Scottish Gaelic Wikipedia and then fine-tuned on around 2,000 word-level parallel examples. Our initial experiments give promising results, but we highlight the shortcomings of our model, and discuss directions for future work.

Details

Paper ID
lrec2022-ws-cltw-13
Pages
pp. 94-98
BibKey
gow-smith-etal-2022-use
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the 4th Celtic Language Technology Workshop within LREC2022
Location
undefined, undefined
Date
20 June 2022 25 June 2022

Authors

  • EG

    Edward Gow-Smith

  • MM

    Mark McConville

  • WG

    William Gillies

  • JS

    Jade Scott

  • Roibeard Ó Maolalaigh

Links