Back to Main Conference 2014
LREC 2014main

Finite-state morphological transducers for three Kypchak languages

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/467uebniwrpx

Abstract

This paper describes the development of free/open-source finite-state morphological transducers for three Turkic languages―Kazakh, Tatar, and Kumyk―representing one language from each of the three sub-branches of the Kypchak branch of Turkic. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST). This paper describes how the development of a transducer for each subsequent closely-related language took less development time. An evaluation is presented which shows that the transducers all have a reasonable coverage―around 90%―on freely available corpora of the languages, and high precision over a manually verified test set.

Details

Paper ID
lrec2014-main-143
Pages
pp. 3378-3385
BibKey
washington-etal-2014-finite
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • JW

    Jonathan Washington

  • IS

    Ilnar Salimzyanov

  • FT

    Francis Tyers

Links