Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Mar 28, 2019
While reinforcement learning (RL) has achieved impressive advances in games and robotics, it has not been widely adopted in recommender systems. Framing recommendation as an RL problem offers new perspectives, but also faces significant challenges in practice. Industrial recommender systems deal with extremely large action spaces – many millions of items to recommend and complex user state spaces -- billions of users, who are unique at any point in time. In this talk, I will discuss our work on scaling up a policy-gradient-based algorithm, i.e. REINFORCE to a production recommender system at Youtube. We proposed algorithms to address data biases when deriving policy updates from logged implicit feedback. I will also discuss some follow up work and outstanding research questions in applying RL, in particular off-policy optimization in recommender systems.