Collection of Negative Samples for Hit-Song Prediction
Hit song prediction is the task of predicting whether a given song is going to be a hit -- e.g., make it into the charts. One way of realizing this is as a binary classification model which is able to assign a given song to one of two classes: hit or non-hit.
To train such a machine learning model, positive (hits) as well as negative samples (non-hits) are required. Obtaining positive samples is relatively straightforward -- we can define all songs that made it into the charts (e.g., the Billboard Hot 100) as positive samples. Negative samples, on the other hand, are more tricky:
We have to ensure that the songs we choose as negative samples had the chance to make it into the charts we use as our source for the positive samples.
For positive samples, there is a natural measure for positivity -- the peak position the song reached on the charts. No such natural measure exists for negative samples.
In this thesis, the goals are to devise a method for collecting true negative samples as well as to develop an effective measure for negativity.