The Data Acquisition Framework: Bridging Psychometrics and NLP for Personality Dataset Construction
Proceedings of the 1st Workshop on Social Context (SoCon) and the 2nd Workshop on Integrating NLP and Psychology to Study Social Interactions (NLPSI) @ LREC 2026
Abstract
Existing datasets for personality recognition in Natural Language Processing (NLP) suffer from documented quality problems: self-reported labels lacking psychometric validation, limited domain diversity and lack of context. Despite these known limitations, state-of-the-art approaches continue relying on the same datasets due to absence of alternatives. We present the Data Acquisition Framework (DAF), which addresses this gap by systematically translating psychometric questionnaire items into controlled communication scenarios through expert-community validation. DAF-items, validated scenario descriptions with contextual parameters, are deployed via the Automatic Data Acquisition and Annotation Tool (ADAAT). Participants complete personality surveys and engage in scenario-based text interactions with LLM personas configured to the DAF-Item context. This yields communication data with direct, item-level psychometric annotations.