REINFORCEMENT LEARNING: Lecture-2: Value Functions and Markov Property. Sanjeev Sharma, Founder & Co-Owner - searching-eye.com , undergraduate, IIT R.
In this lecture I discussed about the Eposidic & Continual Tasks. I also discussed about the State, Rewards, Returns, Discounted Return and Agent Environment Interaction Process. I also provided the details about the Discounting Parameter and proved that the Expected Return is Finite through Discounting. Then I also discussed about the Kind of value functions i.e. state value function and action value function of a policy. I also derived the expression for State-Value Function for a policy and provided the interpretation of each term involved in the BELLMAN Equation. I also provided a very brief introduction to MARKOV PROPERTY, MARKOV STATES and MDPs. More details about the BELLMAN Equation and MDPs will be discussed in Lecture 3. Much of the terms like Bellman Optimality Equation and relation between State-Value and Action Value Function is skipped in this lecture as this will form the topic of discussion in lecture 3.
Every video is available on searching-eye[dot]com as a single file and full length. The youtube channel just shows some sample videos.
sanjeev3007 1 year ago