Eyebots guide Footbots between a source and a target location. They maintain two policies for guiding the Footbots, one pointing towards the target and one pointing towards the source. The Eyebots sample from these policies to give instructions to the Footbots, and they observe Footbot behavior to update their policies. Specifically, if an Eyebot sees a Footbot that is travelling from the source to the target, it increases the policy pointing to the source for the direction that the Footbot is coming from. Also, it decreases the policy pointing to the target for this same direction. When an Eyebot sees Footbots performing obstacle avoidance behavior, they decrease both policies in the direction of the location of these Footbots, as obstacle avoidance behavior points to the presence of obstacles or to congestion between Footbots.
In the specific experiment we carry out here, we can see how the system learns to send the Footbots around some obstacles on the ground.
Link to this comment:
All Comments (0)