undefined | Better HN

0 pointsgeebee7y ago0 comments

Thank you for the link. I'll read this paper. I'm hoping to reply but the thread may be stale by the time I do. Right now, my thoughts are: if it is easily fitting, what are the assignment probabilities? Are we getting 90%+, or is it fitting easily, but to much lower probabilities. Also, is there a big difference between neural nets and other algorithms like RF?

The paper certainly does appear to address the question of categorizing completely randomized input:

From the abstract

"...our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a ran- dom labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by com- pletely unstructured random noise.

0 comments

3 comments · 1 top-level

geebeeOP7y ago· 2 in thread

I'm going off a first pass through the paper, but it appears that what this paper shows is that the training error can be 0 on an entirely randomized data set, but the generalization error - the difference between the error on the test set and the training set, does increase dramatically as label corruption increases.

My understanding is that cross validation does multiple combinations of splitting the input data into test and training sets... so if cross validation measures the generalization error, wouldn't this catch the low predictive value resulting from randomization of labels or input?

I'm not saying the paper doesn't have value, but I think it's more about the fact that neural nets can obtain a training error of zero on randomized data, not a testing error (or generalization error, which represents the difference between training error and testing error, as far as I can tell).

To be clear, I'm not an expert, and this is just what I gleaned from a first pass over the paper.

sdenton47y ago

All true. The interesting thing here is that the neural network has /no idea/ that it sucks at generalization, though. Yes, we can do extra work to calibrate outputs, but it would be much better to have some idea of uncertainty from the network itself.

(Added as edit) also keep in mind that datasets themselves often fail to generalize - overriding to a particular set makes for domain error when moving to slightly different data. Cross validation won't help wit that, but more "self aware" algorithms might.

geebeeOP7y ago

But... isn't that the entire point of splitting your initial training data into a training set and a separate testing set? Why is it better to have an idea of uncertainty from the model itself when you can get the generalization error through cross validation, or by setting aside a testing set?

It's interesting to see that a neural net will reach a training error of zero on randomized data, and it's a worthwhile contribution to the literature to demonstrate this, test it, and measure it... but the outcome here doesn't surprise me. From experience I know that random forests will also show nearly 100% accuracy on a training set but show far lower accuracy for a testing set, so while I think it's great to measure it, the conclusion in this paper is not surprising.

In no way is that a knock on the paper, people weren't surprised that Fermat's last theorem turned out to be true, but that doesn't make the proof any less of an accomplishment!

1 more reply

j / k navigate · click thread line to collapse