Oh, I see! Not quite autodiff then, indeed.
> Long sums, btw, are bad for precision. Typically people try to sort them or compute them in hierarchical/ butterfly reduce fashion, or use other tricks.
Typical parallel reduction algorithms do precisely this, but for different reasons (parallelisation). I guess I'd have to compute a partially symbolic expression for the error bound depending on the array size, and hope that I'm not going to change the parallel reduction algorithm without changing the error bounds in tests.
It's possible, but finicky. I see why people slap simple relative bounds on things and call it a day. :)