HomeLREC 2026WorkshopsCAWLlrec2026-ws-cawl-04
Back to CAWL 2026
LREC 2026workshop

SoriGraph: A New Database of Visual Feature-Level Descriptions of Written Korean

Proceedings of the Third Workshop on Computation and Written Language (CAWL 2026) @ LREC 2026

DOI:10.63317/3pgytnqveysz

Abstract

Phoneticians and phonologists have developed featural systems that enable systematic description of human speech sounds. However, no such systems exist for describing the visual features of writing systems. It is critical to understand the features of writing systems given their central role in many language users’ everyday experience. Just as phonetic and phonological features provide insight into speech perception, visual features can play a similar role for studying reading. In this paper, we introduce SoriGraph, a database of visual feature descriptions and IPA transcriptions for the full lexicon of Korean, drawing on a recent large-scale study of the visual features of writing systems. This database enables analysis of the visual and phonological properties of Korean and will be a critical resource for researchers. We describe the construction of the database and provide an overview of several potential uses of the database, and demonstrate one potential usage (information-theoretic analysis of lexicon structure).

Details

Paper ID
lrec2026-ws-cawl-04
Pages
pp. 45-49
BibKey
bushong-etal-2026-sorigraph
Editors
Kyle Gorman
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the Third Workshop on Computation and Written Language (CAWL 2026) @ LREC 2026
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • WB

    Wednesday Bushong

  • HH

    Hala Habahbeh

  • RJ

    Ryan Jiang

  • YK

    Yoolim Kim

Links