Back to Main Conference 2016
LREC 2016main

Age and Gender Prediction on Health Forum Data

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)

DOI:10.63317/4fd2s2dy33rv

Abstract

Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.

Details

Paper ID
lrec2016-main-541
Pages
pp. 3394-3401
BibKey
shrestha-etal-2016-age
Editor
N/A
Publisher
European Language Resources Association (ELRA)
ISSN
2522-2686
ISBN
978-2-9517408-9-1
Conference
Tenth International Conference on Language Resources and Evaluation
Location
Portorož, Slovenia
Date
23 May 2016 28 May 2016

Authors

  • PS

    Prasha Shrestha

  • NR

    Nicolas Rey-Villamizar

  • FS

    Farig Sadeque

  • TP

    Ted Pedersen

  • SB

    Steven Bethard

  • TS

    Thamar Solorio

Links