Back to Main Conference 2004
LREC 2004main

Highlighting Latent Structure in Documents

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/4sjf6ibymo3i

Abstract

Extensible Markup Language (XML) is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. It is a simple, very flexible text format, used to annotate data by means of markup. XML documents can be checked for syntactic well-formedness and semantic coherence through DTD and schema validation which makes their processing easier. In particular, data with nested structure can be easily represented with embedded tags. This structured representation should be used in information retrieval models which take structure into account. As such, it is meta-data and therefore a contribution to the Semantic Web. However, nowadays, there exists huge quantities of raw texts and the issue is how to find an easy way to provide these texts with sensible XML structure. Here we present an automatic method to extract tree structure from raw texts. This work has been supported by the Paris XI University (BQR2002 project, Paris-XI University).

Details

Paper ID
lrec2004-main-118
Pages
N/A
BibKey
folch-etal-2004-highlighting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • HF

    H. Folch

  • BH

    B. Habert

  • MJ

    M. Jardino

  • NP

    N. Pernelle

  • MR

    M.C. Rousset

  • AT

    A. Termier

Links