I know that as a student who has taken machine learning classes, half the time it feels like the way I'm being taught is also like this ridiculous way of speaking to me to describe how my own learning is going. Do I understand the problem correctly, do I have to jump ahead, or take a step back. The language can sometimes matter, so the only reason I recommend category theory is because it helps take a step outside of the space that is describing all these successive optimizations. It allows to describe the overall structure of how each mathematical operation interacts in relation to each other - in terms of sets of both numbers and functions that are either partially ordered or totally ordered, and from that, I would think more things could then be said about the relation of each independent optimization function in relation to the context (problem space) it is contained in. Something confusing seems to be that the actual calculation can have an unknown effect on the resulting function - so being able to think relationally about an individual calculation to the full computation - I dunno - that just seems very interesting to me.
Again, handful of salt, I'm no machine learning expert nor am I an expert in category theory, and I'm sure I'm not being as precise as I'd want to be if this was something I did career wise (I just code stuff). Hobbyist interest that is a remnant of a time I once believed I could work on a PhD.
Point is, I'm just making sure you are just as well aware as I am of myself, that I just see an interesting connection in how machine learning is growing, and I like category theory. It's the math of math, or the logic of math, something like that. At the very least, having more than one way of seeing the problem can't hurt, can it? If they both yield the same statements, that's at least saying something slightly more concrete?
I'm not sure if I have any key insights to offer, just that the balance of each individual calculation versus the whole direction of machine learning seems to be something of profound importance, at least from my perspective. Being able to generalize and say 'there exists an ordered structure or there does not' on top of it - that seems like something I vaguely identify as important. Allowing to at least differentiate between computable function spaces versus ones that cycle, which could tie all that back into the computation from which it came, which I suppose the ideal is, program programs that program programs?
Just rambling though, thanks for humoring me!
I don't know what you mean by chasing residual but I'm assuming it's that little tiny margin of error you just can't seem to catch up to. I don't know whether this is possible, but so much of my insight on that is based on real life. The only places I've found reason to use any machine learning techniques is to describe things about computable functions.
I'll definitely look further into what you've said in the future in a precision sense, it does certainly seem interesting. I personally am just never quite sure what I understand and what I don't.
Have a good one!