Author Profiling on Social Question and Answer Networks

Thesis Type Master
Thesis Status
Student Benjamin Binder
Thesis Supervisor
Research Field

Due to its diverse real-world applications, the research field of author profiling has recently been attracting growing attention and importance in both academic research and industry. Nevertheless, past research approaches consider only text from blogs or micro-blogs. Therefore, this thesis focuses on the analysis of user posts extracted from question and answering platforms, in particular, the stack exchange network. It examines the applicability of current author profiling approaches on user posts in order to predict the author’s gender. As gender information is not available in the used dataset, a pre-trained neural network is applied to define the user’s gender based on the profile picture. By comparing various combinations of text processing approaches, Bag of N-gram variations, dimensionality reduction approaches, and applied classifiers, this thesis shows that a multilayered perceptron in combination with word n-grams and univariate feature selection achieves a solid classification accuracy of over 90%.