undefined | Better HN

0 pointsparrt7y ago0 comments

I’m not sure about the connection to category theory. This is mostly an attempt to explain why this model works, that it is performing gradient descent in a particular space. We find that extremely challenging to explain to students. I would be interested to know if you feel the article helps in that regard. Thanks

0 comments

3 comments · 1 top-level

s-shellfish7y ago· 2 in thread

My intuition from reading the explanations is you are performing gradient descent on gradient descent. If you understand gradient descent I don't see how this is challenging to explain to students. GBM just refines the problem space to a smaller context, but it's still the same mathematical operation of approximation. I think it gets confusing because of that 'recursive' nature. Disconnecting the explanation from the math that it's based on might be simpler to explain.

parrtOP7y ago

The key insight seems to be that chasing residuals (for MSE) or sign vectors (for MAE) is chasing a vector (ie direction not just magnitude) and that vector is also a gradient. So chasing residual is performing gradient descent.

s-shellfish7y ago

Having more than one way of explaining it can be helpful. Sign vector, direction - these things can be described in terms of orders of sets, monotonicity of compositional functions yielding recursive structures. I haven't done the work work so take this with a handful of salt.

I know that as a student who has taken machine learning classes, half the time it feels like the way I'm being taught is also like this ridiculous way of speaking to me to describe how my own learning is going. Do I understand the problem correctly, do I have to jump ahead, or take a step back. The language can sometimes matter, so the only reason I recommend category theory is because it helps take a step outside of the space that is describing all these successive optimizations. It allows to describe the overall structure of how each mathematical operation interacts in relation to each other - in terms of sets of both numbers and functions that are either partially ordered or totally ordered, and from that, I would think more things could then be said about the relation of each independent optimization function in relation to the context (problem space) it is contained in. Something confusing seems to be that the actual calculation can have an unknown effect on the resulting function - so being able to think relationally about an individual calculation to the full computation - I dunno - that just seems very interesting to me.

Again, handful of salt, I'm no machine learning expert nor am I an expert in category theory, and I'm sure I'm not being as precise as I'd want to be if this was something I did career wise (I just code stuff). Hobbyist interest that is a remnant of a time I once believed I could work on a PhD.

Point is, I'm just making sure you are just as well aware as I am of myself, that I just see an interesting connection in how machine learning is growing, and I like category theory. It's the math of math, or the logic of math, something like that. At the very least, having more than one way of seeing the problem can't hurt, can it? If they both yield the same statements, that's at least saying something slightly more concrete?

I'm not sure if I have any key insights to offer, just that the balance of each individual calculation versus the whole direction of machine learning seems to be something of profound importance, at least from my perspective. Being able to generalize and say 'there exists an ordered structure or there does not' on top of it - that seems like something I vaguely identify as important. Allowing to at least differentiate between computable function spaces versus ones that cycle, which could tie all that back into the computation from which it came, which I suppose the ideal is, program programs that program programs?

Just rambling though, thanks for humoring me!

I don't know what you mean by chasing residual but I'm assuming it's that little tiny margin of error you just can't seem to catch up to. I don't know whether this is possible, but so much of my insight on that is based on real life. The only places I've found reason to use any machine learning techniques is to describe things about computable functions.

I'll definitely look further into what you've said in the future in a precision sense, it does certainly seem interesting. I personally am just never quite sure what I understand and what I don't.

Have a good one!

j / k navigate · click thread line to collapse