Apply a torque to swing the pole up and balance it pointing straight up. This is the exact
environment your PPO agent trains on — see if you can beat it.
Observation (what the agent sees)
-1.00
0.00
0.00
Applied torque (your action, range [-2, 2])
20 steps/s
Hold ←/A or →/D (or the buttons) to apply a
torque of ±2; let go for zero torque. The pole starts hanging near the bottom.
θ = 0 is straight up. Note there's no score shown — designing a reward that captures
"upright and still" is your job in the exercise.