Classifcation of User-Generated online Reviews using Machine Learning

Thesis Type Master
Thesis Status
Currently running
Student Markus Rennemeyer
Start
Thesis Supervisor
Contact

Due to the increasing availability of online messages and new practical applications such as computer forensics and low enforcement, the field of authorship attribution has been increasingly the subject of research. However, in the past, authorship attribution tasks focused on long documents such as books or short documents such as tweets. Therefore, this thesis will use the yelp data set which contains user submitted posts with a length of up to 5000 characters to perform an authorship attribution task. By using a support vector machine as classifier, four feature sets and several input parameters, it was possible to achieve an attribution accuracy of over 92%.