Back to Main Conference 2018
LREC 2018main

SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/3hbzopxzt5fx

Abstract

Sanskrit is an ancient Indian language. Several important texts which are of interest to people all over the world today were written in Sanskrit. The Sanskrit grammar has a precise and complete specification given in the text Astadhyayi by Panini. This has led to the development of a number of {\em Sanskrit Computational Linguistics} tools for processing and analyzing Sanskrit texts. Unfortunately, there has been no effort to standardize and critically validate these tools. In this paper, we develop a Sanskrit benchmark called SandhiKosh to evaluate the completeness and accuracy of Sanskrit Sandhi tools. We present the results of this benchmark on three most prominent Sanskrit tools and demonstrate that these tools have substantial scope for improvement. This benchmark will be freely available to researchers worldwide and we hope it will help everyone working in this area evaluate and validate their tools.

Details

Paper ID
lrec2018-main-712
Pages
N/A
BibKey
bhardwaj-etal-2018-sandhikosh
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • SB

    Shubham Bhardwaj

  • NG

    Neelamadhav Gantayat

  • NC

    Nikhil Chaturvedi

  • RG

    Rahul Garg

  • SA

    Sumeet Agarwal

Links