Back to Main Conference 2016
LREC 2016main

Predicting Author Age from Weibo Microblog Posts

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/56a62nsjzo7e

Abstract

We report an author pro ling study based on Chinese social media texts gleaned from Sina Weibo (新浪ž�ō�) in which we attempt to predict the author’s age group based on various linguistic text features mainly relating to non-standard orthography: classical Chinese characters, hashtags, emoticons and kaomoji, homogeneous punctuation and Latin character sequences, and poetic format. We also tracked the use of selected popular Chinese expressions, parts-of-speech and word types. We extracted 100 posts from 100 users in each of four age groups (under-18, 19-29, 30-39, over-40 years) and by clustering users’ posts fifty at a time we trained a maximum entropy classifier to predict author age group to an accuracy of 65.5%. We show which features are associated with younger and older age groups, and make our normalisation resources available to other researchers.

Details

Paper ID
lrec2016-main-478
Pages
pp. 2990-2997
BibKey
zhang-etal-2016-predicting
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • WZ

    Wanru Zhang

  • AC

    Andrew Caines

  • DA

    Dimitrios Alikaniotis

  • PB

    Paula Buttery

Links