undefined | Better HN

0 pointsextremelearning7y ago0 comments

Generally my answer is that this is for the same reason that fitting lines of best fit to data is nearly always done via a least-squares fitting.

Squaring has a few major benefits.

The first is that is never negative.

Therefore, one might ask why don't we just take absolute value (1-norm)? It turns out that the absolute function makes many calculus expressions very messy. Thus, ironically, when analysing these concepts theoreticlly/algebraically it is usually easier to square the errors (use the 2-norm), rather than the 1-norm.

The x^2 function is a very elegant function that smoothly curves. The |x| function has a pointy corner at x=0, which causes many analytical headaches.

(Although, I must admit that in recent years with large-scale computing, errors based on the absolute value are making a notable comeback, especially in machine learning!)

Secondly, history seems to have shown that squaring is frequently the simplest transformation that leads to non-trivial results. Thus, the principle of Occam's razor, would suggest that 2 is a very good place to begin and end.

Finally, if we consider higher powers, it makes sense to ensure our errors are not negative, so that generally rules out cubes. Finding square roots, and roots of quadratic equations is relatively simple, but finding roots of degree 4 polynomials is very tough, and finding roots of higher even degree polynomials is usually intractable.

Hope that helps!

0 comments

1 comments · 1 top-level

JadeNB7y ago

Any answer to "why squaring?" that doesn't reference the usual Euclidean distance, even if indirectly via, say, the Pythagorean theorem, seems to be missing an obvious avenue of (if I may coin an awful word) intuitive-isation.

j / k navigate · click thread line to collapse