Back to Main Conference 2014
LREC 2014main

PACE Corpus: a multilingual corpus of Polarity-annotated textual data from the domains Automotive and CEllphone

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014)

DOI:10.63317/46vg93sgot95

Abstract

In this paper, we describe a publicly available multilingual evaluation corpus for phrase-level Sentiment Analysis that can be used to evaluate real world applications in an industrial context. This corpus contains data from English and German Internet forums (1000 posts each) focusing on the automotive domain. The major topic of the corpus is connecting and using cellphones to/in cars. The presented corpus contains different types of annotations: objects (e.g. my car, my new cellphone), features (e.g. address book, sound quality) and phrase-level polarities (e.g. the best possible automobile, big problem). Each of the posts has been annotated by at least four different annotators ― these annotations are retained in their original form. The reliability of the annotations is evaluated by inter-annotator agreement scores. Besides the corpus data and format, we provide comprehensive corpus statistics. This corpus is one of the first lexical resources focusing on real world applications that analyze the voice of the customer which is crucial for various industrial use cases.

Details

Paper ID
lrec2014-main-240
Pages
pp. 2219-2224
BibKey
haenig-etal-2014-pace
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-8-4
Conference
Ninth International Conference on Language Resources and Evaluation
Location
Reykjavik, Iceland
Date
26 May 2014 31 May 2014

Authors

  • CH

    Christian Haenig

  • AN

    Andreas Niekler

  • CW

    Carsten Wuensch

Links