A collection of blog posts that may appear without link on topics that I think are interesting.
-
The Nuances of Autoreset Modes and Vectorised Rollouts
Next-Step and Same-Step autoreset handle episode boundaries differently. That difference quietly changes what ends up in your rollout buffer and how GAE must...
-
How does Retrace fix Off-Policyness?
When your RL agent trains on data from an old policy, the value estimates go wrong. Retrace was designed to address this problem.
-
Generalized Advantage Estimation (GAE) Explained
Generalised Advantage Estimation (GAE) is a critical component of PPO but how does GAE work? What does it achieve?