The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

Abstract

We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60% of the material and for the lexical sample task 100%. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.

Resources

Details

Paper ID

lrec2016-main-136

Pages

pp. 842-847

DOI

10.63317/427qmnv4m5rs

BibKey

pedersen-etal-2016-semdax

Editors

Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis

Publisher

European Language Resources Association (ELRA)

ISSN

2522-2686

ISBN

978-2-9517408-9-1

Conference

Tenth International Conference on Language Resources and Evaluation

Location

Portorož, Slovenia

Date

23 - 28 May 2016

Authors

BP
Bolette Pedersen
AB
Anna Braasch
AJ
Anders Johannsen
HA
Héctor Martínez Alonso
SN
Sanni Nimb
SO
Sussi Olsen
AS
Anders Søgaard
NS
Nicolai Hartvig Sørensen

Links

URL

DOI