LacrimosaNet: Creation & Investigation towards Music Aware Recommendation
Music recommendation has grown significantly in importance with the rise of multiple music on demand services and their supply of numerous tracks. However, giving users useful music suggestions is a task that differs from other more researched fields such as recommendation of movies and consumer products. For example, the overwhelming amount of provided tracks causes a very pronounced cold start problem especially for newly released tracks. Problems like this can possibly only be minimized by researching and improving comprehensive content-based approaches similarly to the network proposed here. LacrimosaNet is a content-based convolutional neural network extracting audio features from spectrograms guided by collaboratively filtering similarities of high confidence. The network is trained by a playlist dataset to learn features especially related to content-based music perception, differing thereby from more tightly defined classification problems such as genre detection and auto-tagging. Nevertheless, the general informative value of the learned features is tested by such a genre detection task, thereby not only defining a baseline for comparison, but also proving the validity of the used training and evaluation metrics. As the importance of analysing patterns is often underestimated, this work is, by means of our knowledge, the first one to visualize auditory information by explanation procedures previously seen on visual tasks only. Finally, differences between constant-Q and MFCC spectrograms are discussed, as well as the network’s peculiarities such as the triplet loss function, which has been used during the training phase of the network.