Back to Main Conference 2006
LREC 2006main

A methodology for the joint development of the Basque WordNet and Semcor

Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006)

DOI:10.63317/3vbbqwm9n34i

Abstract

This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way through the nominal part of the 300.000 word corpus (roughly equivalent to a 500.000 word corpus for English). We present a detailed description of the task, including the main criteria for difficult cases in the edition of the senses and the tagging of the corpus, with special mention to multiword entries. Finally we give a detailed picture of the current figures, as well as an analysis of the agreement rates.

Details

Paper ID
lrec2006-main-371
Pages
N/A
BibKey
agirre-etal-2006-methodology
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
2-9517408-2-4
Conference
Fifth International Conference on Language Resources and Evaluation
Location
Genoa, Italy
Date
24 May 2006 26 May 2006

Authors

  • EA

    Eneko Agirre

  • IA

    Izaskun Aldezabal

  • JE

    Jone Etxeberria

  • EI

    Eli Izagirre

  • KM

    Karmele Mendizabal

  • EP

    Eli Pociello

  • MQ

    Mikel Quintian

Links