Back to Main Conference 2008
LREC 2008main

Unsupervised Parts-of-Speech Induction for Bengali

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/3vmkqcbf28x6

Abstract

We present a study of the word interaction networks of Bengali in the framework of complex networks. The topological properties of these networks reveal interesting insights into the morpho-syntax of the language, whereas clustering helps in the induction of the natural word classes leading to a principled way of designing POS tagsets. We compare different network construction techniques and clustering algorithms based on the cohesiveness of the word clusters. Cohesiveness is measured against two gold-standard tagsets by means of the novel metric of tag-entropy. The approach presented here is a generic one that can be easily extended to any language.

Details

Paper ID
lrec2008-main-242
Pages
N/A
BibKey
nath-etal-2008-unsupervised
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • JN

    Joydeep Nath

  • MC

    Monojit Choudhury

  • AM

    Animesh Mukherjee

  • CB

    Christian Biemann

  • NG

    Niloy Ganguly

Links