Learning a repertoire of skills based on intrinsic motivation in the iCub simulator (by Leo Pape)

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
26 views
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Jan 8, 2012

Right: simulation setup. Sensory input to the robot consists of horizontal ball coordinates, vertical orientation of the stick, and discretized pose of the two active joints. Prediction targets are vertical stick orientation and the two horizontal ball coordinates. The robot starts from a randomly selected pose, and can move two right-arm joints, at a random (unknown to the robot) velocity. This leads to random displacement of the ball when the robot hits it, but deterministic toppling of the stick.

Left: various training statistics. The prediction errors decrease, then start to fluctuate around small values (because the predictors are limited), the learning progress decreases to 0, and the robot switches to random exploration when no learning progress can be obtained for the skills. Because of randomness in velocity, prediction improvement is occasionally obtained when hitting the ball, leading to improvement of the associated skill. Note that the learned skills remain successful (hitting the ball or toppling the stick) after the predictors cease to improve.

  • likes, 0 dislikes

Link to this comment:

Share to:
see all

All Comments (0)

Sign In or Sign Up now to post a comment!
Loading...
Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more