undefined | Better HN

0 pointsVHRanger3y ago0 comments

You can even do it in a linear regression. Just add enough polynomial terms (x^a) and interaction terms (x1 * x2). The end model looks something like:

y = b + x1 + x1^2 + x1^3 + ... + x1 * x2 + (x1 * x2)^2 + ... + x2 + x2^2 + ...

By that point you're making a Taylor approximation of the latent function through linear space, which is also a universal approximator.

So the commenter above is wrong -- neural networks are indeed just glorified linear regression from this point of view.

The main difference is that this kitchen sink regression is computationally inefficient which neural nets are extremely efficient computationally.*

0 comments

ogogmad3y ago

That would suggest that 1-Hidden-Layer neural nets would work fine, since they are also universal function approximators. But no -- when people talk about "deep learning", the word "deep" refers to having lots of hidden layers.

I'm not an expert, but the motivation seems more like this:

- Linear regression and SVM sometimes work. But they apply to very few problems.

- We can fit those models using gradient descent. Alternatives to gradient descent do exist, but they become less useful as the above models get varied and generalised.

- Empirically, if we compose with some simple non-linearities, we get very good results on otherwise seemingly intractable problems like OCR. See Kernel SVM and Krieging.

- Initially, one might choose this non-linearity from a known list. And then fit the model using specialised optimisation algorithms. But gradient descent still works fine.

- To further improve results, the choice of non-linearity must itself be optimised. Call the non-linearity F. We break F into three parts: F' o L o F'', where L is linear, and F' and F'' are "simpler" non-linearities. We recursively factorise the F' and F'' in a similar way. Eventually, we get a deep feedforward neural network. We cannot use fancy algorithms to fit such a model anymore.

- Somehow, gradient descent, despite being a very generic optimisation algorithm, works much better than expected at successfully fitting the above model. We have derived Deep Learning.

VHRangerOP3y ago

RE: 1-layer neural nets, yes, that's the point.

Deep learning has been a series of *engineering* successes stacking over each other rather than theory being applied rigorously.

It's hard to scale training on the "dumb" approximators like a kitchen sink regression, and balancing overfitting becomes a nightmare.

1 more reply

theGnuMe3y ago

1-layer nets are equivalent in theory and universal approximators, they are just hard to train.

ogogmad3y ago

Read the first sentence

j / k navigate · click thread line to collapse

0 comments

ogogmad3y ago

I'm not an expert, but the motivation seems more like this:

- Linear regression and SVM sometimes work. But they apply to very few problems.

- We can fit those models using gradient descent. Alternatives to gradient descent do exist, but they become less useful as the above models get varied and generalised.

- Empirically, if we compose with some simple non-linearities, we get very good results on otherwise seemingly intractable problems like OCR. See Kernel SVM and Krieging.

- Initially, one might choose this non-linearity from a known list. And then fit the model using specialised optimisation algorithms. But gradient descent still works fine.

- Somehow, gradient descent, despite being a very generic optimisation algorithm, works much better than expected at successfully fitting the above model. We have derived Deep Learning.

VHRangerOP3y ago

RE: 1-layer neural nets, yes, that's the point.

Deep learning has been a series of *engineering* successes stacking over each other rather than theory being applied rigorously.

It's hard to scale training on the "dumb" approximators like a kitchen sink regression, and balancing overfitting becomes a nightmare.

1 more reply

theGnuMe3y ago

1-layer nets are equivalent in theory and universal approximators, they are just hard to train.

ogogmad3y ago

Read the first sentence

j / k navigate · click thread line to collapse