undefined | Better HN

0 pointsgabrielgoh9y ago0 comments

A few misconceptions I should correct in this comment.

- There is plenty of existing technology for handling non-differentiable function. Functions like the absolute value, 2-norm, and so on have a generalization of the gradient (the subgradient) which can be used in lieu of the gradient.

- That functions are "almost everywhere differentiable" (i.e. the non-differentability lies in a manifold of zero measure) makes these functions behave pretty much like smooth ones. This is often not the case as optima often conspire to lie exactly on these nonsmooth manifolds.

0 comments

2 comments · 2 top-level

kkylin9y ago

And error measures involving sum of absolute values (i.e., L1 norm) are central to methods like lasso (https://en.wikipedia.org/wiki/Lasso_(statistics)) and their cousins.

highd9y ago

Yes, that was what I was saying. Absolute value, 2-norm are fine thanks to subgradient techniques and theory, as well as their differentiability over the majority of the function - but you can imagine tons of non-differentiable models where the subgradient is mostly useless and we generally use convex relaxations or other smoother analogs.

I don't think there was any misconception.

j / k navigate · click thread line to collapse