Requires WebGPU.
avg500 -4.6 last 500 episodes
peak 3959.3 best window
roll/s 20.68 20-step avg
progress 4388 562749 episodes
But at around 4K avg score you should see it solve the env almost every time.
Just a demo :) optimized for speed over stability.
Reward structure: Step: -1 Dot: +100 Win: +1000 so ~4k is max theoretical score on 6x6.
Alternatively it might be a problem with the scoring model in the end game.
That is the point, there is nothing on an intention that we cannot improve, the goal here is no more than 1 unique iteration of the same path
I noticed that if you go from training to watch and then back, the training temporarily drop significantly in score.
"The page https://ppo.gradexp.xyz/ has been detected with suspicious activity. It is not recommended to continue browsing this website."
Same for:
https://ppo.gradexp.xyz/version.js
https://ppo.gradexp.xyz/dist/sizes.js
https://ppo.gradexp.xyz/dist/size_6/manifest.j
Bitdefender here shows clean
trained and made a viz for the model and then made it displace text.
should probably do a proper write-up:https://x.com/i/status/2038367016969724259
Snake Game, training entirely in the browser. Built on tinygrad: the rollout / targets / train graphs are TinyJits authored in Python, then compiled once to WGSL and replayed here under WebGPU.
Observation: flat 10×10 board (100) + 4-dim prev-action one-hot = 104 dims. fc_pi.weight is zero-init so the opening policy is uniform over the legal actions; fc_v uses tinygrad's default Kaiming init.
Per rollout: T=24 × N=384 parallel snakes (9,216 transitions), then K=3 epochs × 4 mini-batches of PPO updates. GAE γ=0.99, λ=0.95; AdamW wd=0.01; ratio clip ε=0.1; grad-norm 0.5; Huber value β=1, val_coef=1; entropy bonus 0.008333333333333333.
Action mask + value clip + KL early stop. The 4-dim prev_a obs tail lets fc_pi zero the U-turn logit (the env silently overrides same-axis reversals anyway). Value loss is max(huber(v_new−td), huber(v_clip−td)) at ε=0.2. Approx-KL is sampled after each epoch and breaks the loop at 1.5·kl_target.Looks like this is for Linux and Windows, on NetBSD I get this issue :(
> WebGPU is not yet available in Release or late Beta builds.