Added: 3 years ago
From: StanfordUniversity
Views: 14,019
Sort by time | Sort by thread (beta)

Link to this comment:

Share to:
see all

All Comments (6)

Sign In or Sign Up now to post a comment!
  • In the computation you start at 57, to determine why moving west is better than moving north from the (3,1) state, it seems that you disregarded or forgot the discount factor, without mentioning it. I do think that in this case it suffices to look at undiscounted values to determine the optimal action, because there are no intermediate rewards. I find find this just a bit misleading, but I also wanted to share my thoughts. Great lecture (so far)!

  • This has been very useful to me! thank you!

  • This is a very good introduction lecture on Reinforcement Learning!

Loading...
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more