Last I checked, differential programming involved exploding matrix sizes at 2^n in the number of memory bytes that the differentiable program could access. Is the situation any better now? If not it seems kind of like training a function approximator map the present cycle's memory and register state to the i+1th state.