This the first successfull experiment of my master thesis. The Robot learns the Parameters of a PD-Controller, which trys to minimize the distance to the ball in about 30 Trials. It uses the episodic natural actor critic algorithm with a constant baseline to compute the policy gradient.
Link to this comment:
All Comments (0)