Result of the Learning algorithm

I added trails to the agents to see how they behave. Next step is to give a very low reward when the agent goes past the obstacle. Other step is to give a very high reward when the agent reaches to goal.

Also, the scenario I set up is not similar to the scenario used in the thesis paper.

There is the result: