Back to Main Conference 2002
LREC 2002main

Multilingual XML-Based Named Entity Recognition for E-Retail Domains

Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002)

DOI:10.63317/4s4rcjj6y6xy

Abstract

We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.

Details

Paper ID
lrec2002-main-233
Pages
N/A
BibKey
grover-etal-2002-multilingual
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
N/A
Conference
Third International Conference on Language Resources and Evaluation
Location
Las Palmas, Spain
Date
29 May 2002 31 May 2002

Authors

  • CG

    Claire Grover

  • SM

    Scott McDonald

  • DG

    Donnla Nic Gearailt

  • VK

    Vangelis Karkaletsis

  • DF

    Dimitra Farmakiotou

  • GS

    Georgios Samaritakis

  • GP

    Georgios Petasis

  • MP

    Maria Teresa Pazienza

  • MV

    Michele Vindigni

  • FV

    Frantz Vichot

  • FW

    Francis Wolinski

Links