Back to Main Conference 2016
LREC 2016main

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/2od3bcjgbz8n

Abstract

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition -- capitalization -- is absent, as the language's Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.

Details

Paper ID
lrec2016-main-529
Pages
pp. 3318-3324
BibKey
littell-etal-2016-bridge
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • PL

    Patrick Littell

  • DM

    David R. Mortensen

  • KG

    Kartik Goyal

  • CD

    Chris Dyer

  • LL

    Lori Levin

Links