If this is your first time hearing about distill, they have so many more articles that are of equally astounding quality. I’ve been reading them for a while now and love what they do!
Why use differentiable models here, it seems like a big limitation to CA just to allow back-prop. Can't these be trained with genetic algorithms or evolutionary methods like SPSA?