In the computation you start at 57, to determine why moving west is better than moving north from the (3,1) state, it seems that you disregarded or forgot the discount factor, without mentioning it. I do think that in this case it suffices to look at undiscounted values to determine the optimal action, because there are no intermediate rewards. I find find this just a bit misleading, but I also wanted to share my thoughts. Great lecture (so far)!
This has been flagged as spam show
Give me the strength to absorb this lecture.
grunder20 1 month ago
This has been flagged as spam show
sparkling lecture!!
grunder20 2 months ago
In the computation you start at 57, to determine why moving west is better than moving north from the (3,1) state, it seems that you disregarded or forgot the discount factor, without mentioning it. I do think that in this case it suffices to look at undiscounted values to determine the optimal action, because there are no intermediate rewards. I find find this just a bit misleading, but I also wanted to share my thoughts. Great lecture (so far)!
jacobakkerboom 2 months ago
This has been very useful to me! thank you!
MatthewHudghton 9 months ago 4
This is a very good introduction lecture on Reinforcement Learning!
GilCohen82 1 year ago