Loading...

Consistent Bellman Operator on the Bicycle Task

215 views

Loading...

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Uploaded on Dec 1, 2015

This video shows trajectories generated within the Bicycle domain (Randlov and Alstrom, 1998). The trajectories are generated with either the Bellman or Consistent Bellman operator. The Consistent Bellman operator is described in "Increasing the Action Gap: New Operators for Reinforcement Learning" by Bellemare et al. (2016). The paper is available at

http://www.marcgbellemare.info

Each trajectory is generated from the greedy policy corresponding to the value function after K = 10, 20, ... value iteration steps.

See also videos comparing a Deep Q-Network agent trained on Space Invaders with similar operators:

https://youtu.be/wDfUnMY3vF8
https://youtu.be/1u9p-7i1ymY

Loading...

to add this to Watch Later

Add to

Loading playlists...