Back to Main Conference 2018
LREC 2018main

Generating a Gold Standard for a Swedish Sentiment Lexicon

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DOI:10.63317/439usxg8i7vq

Abstract

There is an increasing demand for multilingual sentiment analysis, and most work on sentiment lexicons is still carried out based on English lexicons like WordNet. In addition, many of the non-English sentiment lexicons that do exist have been compiled by (machine) translation from English resources, thereby arguably obscuring possible language-specific characteristics of sentiment-loaded vocabulary. In this paper we describe the creation of a gold standard for the sentiment annotation of Swedish terms as a first step towards the creation of a full-fledged sentiment lexicon for Swedish -- i.e., a lexicon containing information about \emph{prior} sentiment (also called polarity) values of lexical items (words or disambiguated word senses), along a scale negative--positive. We create a gold standard for sentiment annotation of Swedish terms, using the freely available SALDO lexicon and the Gigaword corpus. For this purpose, we employ a multi-stage approach combining corpus-based frequency sampling and two stages of human annotation: direct score annotation followed by Best-Worst Scaling. In addition to obtaining a gold standard, we analyze the data from our process and we draw conclusions about the optimal sentiment model.

Details

Paper ID
lrec2018-main-426
Pages
N/A
BibKey
rouces-etal-2018-generating
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
79-10-95546-00-9
Conference
Eleventh International Conference on Language Resources and Evaluation
Location
Miyazaki, Japan
Date
7 May 2018 12 May 2018

Authors

  • JR

    Jacobo Rouces

  • NT

    Nina Tahmasebi

  • LB

    Lars Borin

  • SR

    Stian Rødven Eide

Links