Second set of Tests

Specs for the test: Only one agent is generated. It runs for 300 000 iterations. When the agent reaches to the goal, every obstacle and the goal is moved to another random location.

Step size of learning (alpha ) is 0.1 and Discount factor (gamma) is 0.1.

Below is the image from the sample test scene. Red circle is the goal location. Blue circle is the obstacle.

Below is a sample trajectory the agent takes while learning. Left one is after 200 000 iterations and the middle one is after 400 000 iterations. The straight line from the goal object going through the objects is the result of setting the location of the agent after it reaches to the goal. (This is while agent is learning and the goal location is not randomized) Image at the right is when there is an obstacle.