Back to Main Conference 2000
LREC 2000main

LT TTT - A Flexible Tokenisation Tool

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/5ddt5mtty48f

Abstract

We describe LT TTT, a recently developed software system which provides tools to perform text tokenisation and mark-up. The system includes ready-made components to segment text into paragraphs, sentences, words and other kinds of token but, crucially, it also allows users to tailor rule-sets to produce mark-up appropriate for particular applications. We present three case studies of our use of LT TTT: named-entity recognition (MUC-7), citation recognition and mark-up and the preparation

Details

Paper ID
lrec2000-main-070
Pages
N/A
BibKey
grover-etal-2000-lt
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • CG

    Claire Grover

  • CM

    Colin Matheson

  • AM

    Andrei Mikheev

  • MM

    Marc Moens

Links