Back to Main Conference 2004
LREC 2004main

Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/58utgfkytti7

Abstract

In this paper, we describe initial efforts at Hewlett-Packard Labs, Bangalore, to create datasets of online handwriting in Indic scripts to support research in online handwriting recognition for the Indic scripts. The term "online" here refers to the fact that handwriting is captured as a stream of (x,y) points using an appropriate pen position sensor (often called a digitizer), rather than as a bitmap (image). The paper describes the structure of Indic scripts in brief. It identifes different choices for segmenting characters into simpler shapes that can then be recognized using pattern recognition techniques. The paper discusses these issues in the context of the Tamil script. The remainder of the paper provides an overview of two distinct data collection efforts for the Tamil script - one at the isolated character level, and the other for isolated words. In the context of these efforts, we briefy describe the data collection procedure, tools for collection and subsequent annotation, user-interface issues, the annotation scheme, and the organization of the dataset. The paper concludes with the current status of the effort and future directions.

Details

Paper ID
lrec2004-main-022
Pages
N/A
BibKey
bhaskarabhatla-madhvanath-2004-experiences
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • AB

    Ajay S. Bhaskarabhatla

  • SM

    Sriganesh Madhvanath

Links