Back to Main Conference 2000
LREC 2000main

An Open Architecture for the Construction and Administration of Corpora

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/4qzz8uq4n99q

Abstract

The use of language corpora for a variety of purposes has increased significantly in recent years. General corpora are now available for many languages, but research often requires more specialized corpora. The rapid development of the World Wide Web has greatly improved access to data in electronic form, but research has tended to focus on corpus annotation, rather than on corpus building tools. Therefore many researchers are building their own corpora, solving problems independently, and producing project-specific systems which cannot easily be re-used. This paper proposes an open client-server architecture which can service the basic operations needed in the construction and administration of corpora, but allows customisation by users in order to carry out project-specific tasks. The paper is based partly on recent practical experience of building a corpus of 10 million words of Written Business English from webpages, in a project which was co-funded by ELRA and the University of Wolverhampton.

Details

Paper ID
lrec2000-main-131
Pages
N/A
BibKey
orasan-krishnamurthy-2000-open
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • CO

    Constantin Orăsan

  • RK

    Ramesh Krishnamurthy

Links