Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Uploaded on Dec 1, 2015
This video shows trajectories generated within the Bicycle domain (Randlov and Alstrom, 1998). The trajectories are generated with either the Bellman or Consistent Bellman operator. The Consistent Bellman operator is described in "Increasing the Action Gap: New Operators for Reinforcement Learning" by Bellemare et al. (2016). The paper is available at