SANJEEV SHARMA : 24th Oct 2010: REINFORCEMENT LEARNING: Phase-II, Presentation-1 (P2P1): Least-Squares Temporal Difference Learning.
CONTENTS: Value Function, Value Function Approximation, Linear Function Approximation, TD Learning, LSTD algorithm.
DESCRIPTION: LSTD Algorithm is a modification over the TD Learning methods. Though both of them are same in spirit for they solve for the fixed-point methods, they differ in the way they approach the solution. TD is an incremental and online algorithm whereas the LSTD is an offline and batch learning algorithm. TD solves for the expected TD-error=0 whereas LSTD directly computes the weight vector for which the expected TD-Update is = 0. In this lecture I have discussed the LSTD algorithm.
Link to this comment:
All Comments (0)