Back to Main Conference 2016
LREC 2016main

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2od3bcjgbz8n

Abstract

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition -- capitalization -- is absent, as the language's Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.

Details

Paper ID
lrec2016-main-529
Pages
pp. 3318-3324
BibKey
littell-etal-2016-bridge
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • PL

    Patrick Littell

  • DM

    David R. Mortensen

  • KG

    Kartik Goyal

  • CD

    Chris Dyer

  • LL

    Lori Levin

Links