Acceleration and Compression of Deep Click-Through Rate Prediction Models
Deep neural networks have shown great success in many different areas. However, these models often are computationally and memory intensive. Particularly, recommender systems are sensitive to latency. Therefore big models with high inference times are not suitable for production in this field. In this thesis, we will apply several compression methods on a state-of-the-art deep recommender system model to improve latency and to decrease memory consumption while keeping the accuracy as high as possible.