Reinforcement Learning: Least-Squares Temporal Difference Learning.(P2P1). Part-1

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
418 views
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Oct 24, 2010

SANJEEV SHARMA : 24th Oct 2010: REINFORCEMENT LEARNING: Phase-II, Presentation-1 (P2P1): Least-Squares Temporal Difference Learning.

CONTENTS: Value Function, Value Function Approximation, Linear Function Approximation, TD Learning, LSTD algorithm.

DESCRIPTION: LSTD Algorithm is a modification over the TD Learning methods. Though both of them are same in spirit for they solve for the fixed-point methods, they differ in the way they approach the solution. TD is an incremental and online algorithm whereas the LSTD is an offline and batch learning algorithm. TD solves for the expected TD-error=0 whereas LSTD directly computes the weight vector for which the expected TD-Update is = 0. In this lecture I have discussed the LSTD algorithm.

Category:

Science & Technology

Tags:

License:

Standard YouTube License

  • likes, 0 dislikes

Link to this comment:

Share to:
see all

All Comments (0)

Sign In or Sign Up now to post a comment!
Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more