A RL agent for Minirace 10
In a level 2 version of the game, the observed state consists of two numbers: dx1, dx2. The first value (dx1) is the
same as dx in level 1 – the relative position of the track in front of the car. The second value (dx2) is the position of the subsequent track , relative to the track in front of the car .
A second difference is that the track can be more curved: sometimes the track will only
overlap on the left or right edge. This means the agent cannot always drive in the middle of the track, because the car can only move one step to the left or right at a time.
For this task, you can initialise like this:
the race = Minirace
In the level, step() returns two unnormalised pixel difference values .
5
Steps
1. Create a RL agent that finds a policy using
(all) level 2 state information. A suggested discount factor is γ = 0.95.
2. You can choose the algorithm .
3. Try to train an agent that achieves a running reward > 50.
4. If you use a neural network, not go overboard with the number of hidden layers
as this will significantly increase training time. Try one hidden layer.
5. Write a description explaining how your approach works, and how it performs. If
some of your attempts are unsuccessful, also describe some of the things
that did not work, and which changes made a difference.
What to submit:
• Submit the python code of your solutions.
• For your report, describe the solution, mention the Test-Average and Test-Standard-
Deviation, and include the Training Reward plot described above.
Tips
1. For the RL-tasks, it often takes some time until the learning picks up, but they
should not take hours. If the agent doesn’t learn, explore different learning rates.
For Adam, try values between 5e-3 and 1e-4.
2. Even if the learning does not work, remember that we would like to see that you
understood the ideas behind the code. Describe the ideas that you tried, and still
submit your code but say what the problem was.
Last Completed Projects
topic title | academic level | Writer | delivered |
---|