During this stage, objects are placed sometimes out-of-reach of the robot. This causes a noticable change in the previously learned policy's transition dynamics. The robot autonomous adapts to this change by discovering the position of the object with respect to the success of a the "reach" action, thus learning a new policy to increase its rate of reward.
0:32 fail lol
KarikoEmpire34 2 years ago