Back to Main Conference 2012
LREC 2012main

Construction of the Turkish National Corpus (TNC)

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/39gnocw3eigg

Abstract

This paper addresses theoretical and practical issues experienced in the construction of Turkish National Corpus (TNC). TNC is designed to be a balanced, large scale (50 million words) and general-purpose corpus for contemporary Turkish. It has benefited from previous practices and efforts for the construction of corpora. In this sense, TNC generally follows the framework of British National Corpus, yet necessary adjustments in corpus design of TNC are made whenever needed. All throughout the process, different types of open-source software are used for specific tasks, and the resulting corpus is a free resource for non-commercial use. This paper presents TNC's design features, web-based corpus management system, carefully planned workflow and its web-based user-friendly search interface.

Details

Paper ID
lrec2012-main-590
Pages
pp. 3223-3227
BibKey
aksan-etal-2012-construction
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • YA

    Yeşim Aksan

  • MA

    Mustafa Aksan

  • AK

    Ahmet Koltuksuz

  • TS

    Taner Sezer

  • ÜM

    Ümit Mersinli

  • UD

    Umut Ufuk Demirhan

  • HY

    Hakan Yılmazer

  • GA

    Gülsüm Atasoy

  • Seda Öz

  • İY

    İpek Yıldız

  • ÖK

    Özlem Kurtoğlu

Links