Back to Main Conference 2012
LREC 2012main

DutchSemCor: Targeting the ideal sense-tagged corpus

Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012)

DOI:10.63317/5iawtaf7ekbh

Abstract

Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. In this paper, we discuss the different conflicting requirements for a sense-tagged corpus and our strategies to fulfill them. We report on a first series of experiments to sup- port our semi-automatic approach to build the corpus.

Details

Paper ID
lrec2012-main-049
Pages
pp. 584-589
BibKey
vossen-etal-2012-dutchsemcor
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-7-7
Conference
Eighth International Conference on Language Resources and Evaluation
Location
Istanbul, Turkey
Date
21 May 2012 27 May 2012

Authors

  • PV

    Piek Vossen

  • AG

    Attila Görög

  • RI

    Rubén Izquierdo

  • Av

    Antal van den Bosch

Links