Back to Main Conference 2004
LREC 2004main

What is my Style? Using Stylistic Features of Portuguese Web Texts to Classify Web Pages According to Users’ Needs

Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)

DOI:10.63317/5fwy6axsx45i

Abstract

In this paper we investigate the use of stylistic features of Web texts in Portuguese to classify web pages according to users’ needs, in order to improve Web Information Retrieval. We first describe a seven categories classification of users´ needs, which was the outcome of a qualitative analysis of two TodoBr logs (a major Brazilian search engine). We describe 46 shallow linguistic features, inspired by the works of Biber and Karlgren, and proceed describing the compilation of the corpus employed on the classifier training. Our aim is to obtain rules that can be applied on the classification of Web texts according to those seven users´ needs. Some experiments are reported, showing that it is possible, at least for some of the categories, to identify them reliably.

Details

Paper ID
lrec2004-main-305
Pages
N/A
BibKey
aires-etal-2004-style
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-1-6
Conference
Fourth International Conference on Language Resources and Evaluation
Location
Lisbon, Portugal
Date
26 May 2004 28 May 2004

Authors

  • RA

    Rachel Aires

  • AM

    Aline Manfrin

  • SA

    Sandra Aluísio

  • DS

    Diana Santos

Links