New approach has strange results. Here is an initial result:
Here is how they look now:
First test with KTH scene containing Q Learning agents
And to get a sense on how they should work here is the RVO2 library scene (click HERE if you can not see the video):
As the title suggests, I changed the reward approach. Below are the first results.
And the circle scene:
Then this is the two agent test: