Back to Main Conference 2004
LREC 2004main

Migrating Language Resources from SGML to XML: The Text Encoding Initiative Recommendations

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/5g9zvejwf2oq

Abstract

The Text Encoding Initiative (TEI), established in 1987, has been the largest effort in the area of standardisation of computer encoding of language resources. TEI chose SGML (Standard Generalized Markup Language) as its underlying standard, and in the years before the inception of XML, a number of projects encoded their data according to some SGML DTD, TEI compliant, or otherwise. These projects could now benefit from migrating their data to XML. Apart from validation, the most compelling reason for migration is the scarcity of SGML-aware software and the abundance of XML-based tools and related recommendations. However, despite the fact that XML is a subset of SGML, migration is not a trivial process, especially in the case of large holdings of legacy language resources. This is why in 2002 the TEI Consortium established a Task Force on SGML to XML migration. The TF has now produced a number of reports that simplify and make explicit the conversion of SGML TEI (version P3) to XML TEI (version P4) documents. The reports are also relevant for a general audience of SGML users that are considering migrating their language resources to XML. This paper presents the recommendations made by the TF, concentrating on strategic considerations, the practical guide, and one case study, the conversion of the British National Corpus.

Details

Paper ID
lrec2004-main-301
Pages
N/A
BibKey
bauman-etal-2004-migrating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • SB

    Syd Bauman

  • AB

    Alejandro Bia

  • LB

    Lou Burnard

  • TE

    Tomaž Erjavec

  • CR

    Christine Ruotolo

  • SS

    Susan Schreibman

Links