Back to Main Conference 2002
LREC 2002main

A Unicode-based Environment for Creation and Use of Language Resources

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/2onkmujgskxz

Abstract

GATE is a Unicode-aware architecture, development environment and framework for building systems that process human language. It is often thought that the character sets problem has been solved by the arrival of the Unicode standard. This standard is an important advance, but in practice the ability to process text in a large number of the World's languages is still limited. This paper describes work done in the context of the GATE project that makes use of Unicode and plugs some of the gaps for language processing R&D. First we look at storing and decoding of Unicode compliant linguistic resources. The new capabilities for processing textual data and taking advantage of the Unicode standard are detailed next. Finally, the solutions used to add Unicode displaying and editing capabilities for the graphical interface are described.

Details

Paper ID
lrec2002-main-215
Pages
N/A
BibKey
tablan-etal-2002-unicode
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • VT

    Valentin Tablan

  • CU

    Cristian Ursu

  • KB

    Kalina Bontcheva

  • HC

    Hamish Cunningham

  • DM

    Diana Maynard

  • OH

    Oana Hamza

  • TM

    Tony McEnery

  • PB

    Paul Baker

  • ML

    Mark Leisher

Links