This is a short video showcasing the paper "Unifying Count-Based Exploration and Intrinsic Motivation" by Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, and Munos from Google DeepMind. https://arxiv.org/abs/1606.01868
The video depicts a DQN agent playing Montezuma's Revenge via the Arcade Learning Environment. The agent's reward function is augmented with an intrinsic reward based on a pseudo-count, itself computed from a sequential density model. This intrinsic reward allows the agent to explore a full two-thirds of the first level of the game and achieve significantly higher scores than anything previously reported.
See also
Explored rooms during training: https://youtu.be/2q4Tv4WSj_s
Episodes at 50 million frames: https://youtu.be/qeeTok1qDZk
Episode at 100 million frames: https://youtu.be/EzQwCmGtEHs