Back to Main Conference 2000
LREC 2000main

An Architecture for Document Routing in Spanish: Two Language Components, Pre-processor and Parser

Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000)

DOI:10.63317/2unj34vqwz7e

Abstract

This paper describes the language components of a system for Document Routing in Spanish. The system identifies relevant terms for classification within involved documents by means of natural language processing techniques. These techniques are based on the isolation and normalization of syntactic unities considered relevant for the classification, especially noun phrases, but also other constituents built around verbs, adverbs, pronouns or adjectives. After a general introduction about the research project, the second Section relates our approach to the problem with other previous and current approaches, the third one describes corpora used for evaluating the system. The linguistic analysis architecture, including pre-processing and two different levels of syntactic analysis, is described in following fourth and fifth Sections, while the last one is dedicated to a comparative analysis of results obtained from the processing of corpora introduced in third Section. Certain future developments of the system are also included in this Section.

Details

Paper ID
lrec2000-main-068
Pages
N/A
BibKey
rojo-etal-2000-architecture
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Second International Conference on Language Resources and Evaluation
Location
Athens, Greece
Date
31 May 2000 2 June 2000

Authors

  • GR

    Guillermo Rojo

  • Maria Concepción Álvarez

  • PA

    Pilar Alvariño

  • AG

    Adelaida Gil

  • MS

    María Paula Santalla

  • SS

    Susana Sotelo

Links