undefined | Better HN

0 pointsunishark6y ago0 comments

Regularization is used by mathematicians and engineers with a different theoretical perspective. A regular solution is one that is smooth or well-behaved in some sense. The motivation to choose the penalty is based on desired properties of the solution. L2 regularization leads to the least-length solution, L1 is chosen if one wants the sparsest solution, and so on. It is a bit less of a deep motivation, but then the decision of which model to choose is always somewhat ad hoc.

0 comments

3 comments · 1 top-level

astrophysician6y ago· 2 in thread

A lot of tools in ML might be used in other places with other interpretations, and the intuition for L1 and L2 that you describe is not wrong at all, but (1) ML/DS is usually done in a statistical context so I would argue that it’s a good idea to understand that formalism, and (2) that intuition doesn’t sound like it would help you build more complex statistical models, whereas understanding where L1/L2 come from in a Bayesian context definitely would help you understand what you would need to do to form a regularization term for e.g. a probability, or how to edit your loss to learn uncertainty. It also helps you understand what not to do and why not.

All of this is opinion for the most part, and if you feel there is more to learn from alternative interpretations, fine, but the suggestion is to understand the fundamentals of what you’re doing, and you’re usually doing statistics in ML/DS (whether you know it or not). Also, understanding Bayesian stats will make your life easier and it will make it easier to understand lots of other ML concepts in a unified way rather than in an ad hoc way: “minimum length solution” or “sparse solution” is what I mean by ad hoc. Both of those things are true and important, but they’re ad hoc.

unisharkOP6y ago

I'd say pulling a "prior belief" out of the air for your problem, especially if it is conveniently chosen because it is one that is easy to work with, fits your rather broad definition of ad hoc too.

I'd even say the deterministic view is dominant currently. So yes by thinking differently one can get intuition beyond the common knowledge. But it's a nice-to-have not a necessity.

And one can do statistics without being a Bayesian of course.

astrophysician6y ago

> I'd say pulling a "prior belief" out of the air for your problem, especially if it is conveniently chosen because it is one that is easy to work with, fits your rather broad definition of ad hoc too.

This is the number one misunderstanding when it comes to Bayesian stats. Priors are hard, priors are often bullshit, priors are often the source of a “garbage in garbage out” problem, absolutely. I don’t mean to suggest Bayesian stats as something magical (magical thinking will get you in trouble). But whenever a statement like this is made, the implication is that there is some alternative where we can solve the same problem but without priors. That’s just not true: priors are an unavoidable fact of life. If you’re not explicit about your prior, it means you’re still using one but not being upfront about it. So I would agree that priors are difficult and problematic, but they are also unavoidable, and I would not say they’re “ad hoc”. I would also say it’s important to understand what they are.

> I'd even say the deterministic view is dominant currently. So yes by thinking differently one can get intuition beyond the common knowledge. But it's a nice-to-have not a necessity.

I don’t know what you mean by “deterministic”...do you mean “frequentist”? If so I would disagree completely. Frequentist and Bayesian views are equivalent except for philosophy, and frequentist stats are taught at all levels of school until grad school (at least in my experience) and I think that’s a huge mistake. What do you mean by “nice to have not a necessity”? If you’re solving a statistical question, stats are a necessity. Other fields are the nice-to-have intuition. I would agree however that sometimes you’re solving a NON-stats problem in which case have at it with whatever field makes sense.

> And one can do statistics without being a Bayesian of course.

Again, fine, I agree you can use the same math with a different philosophy, the philosophy is up to you, but if you think somehow you can do inference without priors I’m sorry but that’s wrong. In my experience “Frequentist” usually has meant Bayesian but with a flat prior (please comment if you have a counter example).

In summary: study what you want, and lots of perspectives bring more understanding, absolutely. But I stand by the importance of understanding Bayesian stats for doing ML. Even if you don’t like Bayesian stats, it’s still important to understand what is going on. Also I should be clear by “Bayesian” I mean nothing more than understanding what posteriors, priors, and likelihoods are, not a hierarchical model with MCMC or something.

1 more reply

j / k navigate · click thread line to collapse