HomeLREC 2026WorkshopsDIALRESlrec2026-ws-dialres-23
Back to DIALRES 2026
LREC 2026workshop

Pontic Greek in the Caucasus: an online corpus

Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective

DOI:10.63317/5h5zzmh3vmmh

Abstract

This paper presents a multi-media corpus of Pontic Greek as spoken by Pontic Greek speakers in the Caucasus (Georgia). The corpus covers three major stages reflecting different sociolinguistic settings: (a) Ponitc Greek in small rural communities in Georgia (original settlements); (b) internal migration to urban centers (within Georgia), (c) external migration (to Greece). The dataset comprises 373 audio recordings (total duration 7h 26m; total word count: 43.073). The open-access resource includes audio files (wav) and annotations (xml). Annotations provide orthographical transcription, morphemic transcription and morpheme-by-morpheme and sentence-by-sentence translations in English (Toolbox); transcriptions are time-aligned with the audio files (ELAN). This collection is intended to linguists working on dialectology and language contact, as well as people with broader interests about the history and practices of this community. Pontic Greek in the Caucasus offers a unique opportunity to investigate contact between Greek and another Indo-European language (Russian) as well as two Non-Indo-European languages (Georgian, Turkish).

Details

Paper ID
lrec2026-ws-dialres-23
Pages
pp. 230-237
BibKey
berikashvili-etal-2026-pontic
Editors
Antonis Anastasopoulos, Stella Markantonatou, Angela Ralli, Marcos Zampieri, Stavros Bompolas, Vivian Stamou
Publisher
European Language Resources Association (ELRA)
ISSN
N/A
ISBN
N/A
Workshop
Proceedings of the First Workshop on Dialects in NLP — A Resource Perspective
Location
Palma, Mallorca, Spain
Date
11 - 16 May 2026

Authors

  • SB

    Svetlana Berikashvili

  • SS

    Stavros Skopeteas

Links