Back to Main Conference 2016
LREC 2016main
Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Abstract
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition -- capitalization -- is absent, as the language's Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.