I added trails to the agents to see how they behave. Next step is to give a very low reward when the agent goes past the obstacle. Other step is to give a very high reward when the agent reaches to goal.
Also, the scenario I set up is not similar to the scenario used in the thesis paper.
There is the result: