Back to Main Conference 2016
LREC 2016main

Predicting Author Age from Weibo Microblog Posts

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/56a62nsjzo7e

Abstract

We report an author pro ling study based on Chinese social media texts gleaned from Sina Weibo (新浪ž�ō�) in which we attempt to predict the author’s age group based on various linguistic text features mainly relating to non-standard orthography: classical Chinese characters, hashtags, emoticons and kaomoji, homogeneous punctuation and Latin character sequences, and poetic format. We also tracked the use of selected popular Chinese expressions, parts-of-speech and word types. We extracted 100 posts from 100 users in each of four age groups (under-18, 19-29, 30-39, over-40 years) and by clustering users’ posts fifty at a time we trained a maximum entropy classifier to predict author age group to an accuracy of 65.5%. We show which features are associated with younger and older age groups, and make our normalisation resources available to other researchers.

Details

Paper ID
lrec2016-main-478
Pages
pp. 2990-2997
BibKey
zhang-etal-2016-predicting
Editors
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 - 28 May 2016

Authors

  • WZ

    Wanru Zhang

  • AC

    Andrew Caines

  • DA

    Dimitrios Alikaniotis

  • PB

    Paula Buttery

Links