Thoughts

I reached to a point where I am constantly trying the same thing and hoping it will change.

I honestly do not know where is the problem. The agent reaches to the goal. I checked the learning iteration. That is also correct. Reward logic is now fixed. I checked the state calculation a million times.

Still, the agents are not able to avoid each other when they walk towards each other in a circle. Q Learning is what the paper have used.

Rewards they have presented just do not work. Is it something else I need to do ?

I think the training is a problem with the agents. Maybe I can train in a better environment ? Right now, in each iteration there is an obstacle between agent and the goal. Do not know what else to do.

 

Leave a comment