Back to Main Conference 2008
LREC 2008main

Automatic Identification of Temporal Information in Tourism Web Pages

Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008)

DOI:10.63317/5kgsq3wbv9eq

Abstract

This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the temporal information in question is thus particular to this area. The differences that exist between extraction from plain textual data and extraction from the web are brought to light. These differences mainly concern the spatial arrangement of the text, the use of punctuation and the respect of traditional syntactic rules. The temporal expressions to be extracted are classified into two kinds: temporal information that concerns one particular event and repetitive temporal information. We adopt a symbolic approach relying on patterns and rules for the detection, extraction and annotation of temporal expressions; our method is based on the use of transducers. First evaluations have shown promising results. Since the visual structure of a web page is very important and often informs the user before he has even read the text, a semiotic study is also presented in this paper.

Details

Paper ID
lrec2008-main-559
Pages
N/A
BibKey
weiser-etal-2008-automatic
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-4-0
Conference
Sixth International Conference on Language Resources and Evaluation
Location
Marrakech, Morocco
Date
28 May 2008 30 May 2008

Authors

  • SW

    Stéphanie Weiser

  • PL

    Philippe Laublet

  • JM

    Jean-Luc Minel

Links