Lecture 16 | Machine Learning (Stanford)
Loading...
14,033
Top Comments
see all
All Comments (6)
-
In the computation you start at 57, to determine why moving west is better than moving north from the (3,1) state, it seems that you disregarded or forgot the discount factor, without mentioning it. I do think that in this case it suffices to look at undiscounted values to determine the optimal action, because there are no intermediate rewards. I find find this just a bit misleading, but I also wanted to share my thoughts. Great lecture (so far)!
Loading...



This has been very useful to me! thank you!
MatthewHudghton 9 months ago 4
This is a very good introduction lecture on Reinforcement Learning!
GilCohen82 1 year ago