Right: simulation setup. Sensory input to the robot consists of horizontal ball coordinates, vertical orientation of the stick, and discretized pose of the two active joints. Prediction targets are vertical stick orientation and the two horizontal ball coordinates. The robot starts from a randomly selected pose, and can move two right-arm joints, at a random (unknown to the robot) velocity. This leads to random displacement of the ball when the robot hits it, but deterministic toppling of the stick.
Left: various training statistics. The prediction errors decrease, then start to fluctuate around small values (because the predictors are limited), the learning progress decreases to 0, and the robot switches to random exploration when no learning progress can be obtained for the skills. Because of randomness in velocity, prediction improvement is occasionally obtained when hitting the ball, leading to improvement of the associated skill. Note that the learned skills remain successful (hitting the ball or toppling the stick) after the predictors cease to improve.
Link to this comment:
All Comments (0)