Learning Agents

I managed to train an agent to make it reach its goal. My error was in the reward calculation. Unity gives the angle between two vectors as degrees but cosine is taking radian as input and returns radian. I fixed this issue and the result is way better.

Then I tried it with obstacles. This provided a stranger result. I am not sure why this is the case but the agent, for some reason, goes towards the obstacles but it still reaches its goal regardless.

Here is another test with a better collision avoidance. I am not sure yet why it provides better results from time to time.

Now I will try this with different scenarios and try to save the Q-Table in some way (maybe a text file)

More Iterations

I changed the logic of the initial Unity program. Now, for each frame, learning iteration is repeated 100 times. So for 100 frames, we will have 100*100 iterations.

This will train the agent faster and remove the need for extensive simulation times. I also fixed several errors with the learning algorithm.

Here is the latest version of the video. Note that agent is still learning.

Update on the progress

I am trying to get the circle detection working. So far this this what I got:

status-of-tiles

I am not certain yet on how I could have get this working. Vector math is not working currently.

Also I did a performance test on the first approach again and it seems like there is not a problem anymore.

Map Generation

So the current approach of simply shooting rays and checking intersections were slow and caused the program to lag. In order to improve the performance, I added a grid. Approach is to have a tile based map and then find all the points within the circle (radius of the agent).

So I started doing that and here is the first result:

tile-test

This is not the correct result since the updated values should have been looked like a circle but further improvements are on their way.

This is actually a faster approach. It is demonstrated in the following video. I am actually iterating through each tile in every frame.

Update on Calculations

Angle calculation of the states are not going well. I updated the obstacle state calculation. Now the degree of occupancy is being calculated correctly.

Remaining things to do is to calculate the goal state angle properly and the reward calculation needs to be done properly.

Here is the debug messages I am getting right now:

Updating the DoC calculation

I realized that my state calculations was wrong. Starting with degree of occupancy calculations. Agent was always registering the objects as they were at the back side of the agent even though they are right infront of the agent.

Reason for this is, I am only calculating the angle between the forward direction of the agent with the obstacle location. This is wrong since I also have to take into account every part of the obstacle.

I took a look at the internet about this problem. Field of view seemed a good place to start. Then I found out about the RayCasting function in Unity. Now I am shooting rays 360 degrees and checking if it hit something. Here is a sample result:

more-debugging

There are several white lines going towards a sphere. That is the obstacle. So the output is correct.

Next step is to optimize this since shooting rays for 360 times, iterating through which collider they hit and getting the obstacle is very expensive.

Log messages

I am able to print out variables to screen. Also, I am now drawing lines for the obstacle state calculation.

Here is a screenshot of the current version of the application:

debugging

Debugging Functions

I added some debugging functions to the application. I am drawing Lines and circle but now I am able to see the agent’s radius and which objects it is perceiving.

Here is a screenshot of the current version during debugging:

This will be helpful for debugging the program.

Bug Fixes

I updated action class. Now I am only getting which action to take ( integer from 0 to 7 ) then I am getting the rotation angle. I then rotate the agent with that angle.

I also updated the left and right vectors according to the agent’s forward direction. Let and right vectors were fixed values before.

The tests however are getting stranger. Agent is colliding with an obstacle now.

Next steps are to add more detailed debug information and gizmos. Also I will test the state and reward calculation.

Q Learning Continued

I am continuing to implement Q-Learning. I updated some functions and the behavior of the agent was improved.

It is obvious though that the left and right vectors of the agent does not change according to the motion of the agent. That is the next step along with a proper reward function.

Here is what the result looks like: