undefined | Better HN

0 pointsnonbel8y ago0 comments

>"at least on first pass they seem to have a reasonable cross-validation/testing procedure (which is usually one of my major complaints with comp-neurosci papers)."

It looks like the usual overfitting the cv to me... They had 1000 features, 200 datapoints, tried out "several different algorithms".

0 comments

3 comments · 1 top-level

fundamental8y ago· 2 in thread

That doesn't really diminish the results in my book. If you're trying to publish something it's basically assumed that you're going to try out several methods and show the one with better performance even if the performance difference is not statistically different.

As per the number of features and data points: In this field you generally don't have a ton of subjects to sample from and the high dimensional features are a natural result of the array based recordings. It should be possible to perform dimensional reduction on the data, however the ML methods are already implicitly doing that step so it's not necessarily that important.

My normal gripe is when the tested subjects have some data in the training fold and some data in the testing fold (even if the data points are separate). In those cases then the ML method can fit the statistics of a particular subject rather than the true target class (e.g. target movement of a cursor in a BCI). In this paper they explicitly are testing on a subject which was never trained. So, even though the data + particular supervised layer is going to budge the results around some, it should not be a night and day difference from what's expected in reality.

StavrosK8y ago

You have to account for the fact that you used so many algorithms, though. Using ten different algorithm makes it ten times more likely you'll fit your dataset well just by chance.

fundamental8y ago

I acknowledge that the reported accuracy of a system will be higher if you take the max accuracy of 10 methods which have the same 'true' accuracy +- some noise. The results presented in table 5 of the paper are very unlikely in my opinion (as someone currently in the ML field and who has worked in the field of computational neuroscience) to be solely due to randomly trying different ML techniques without the underlying data providing a noteworthy difference between the target classes.

If these results are replicated independently with a different dataset then the magnitude of the overselling of the method will be seen. I just don't think that it makes sense to doubt the results (i.e. with a grid of EEG sensors and bandpower features it is possible to identify a portion of autism cases) based upon this factor alone.

1 more reply

j / k navigate · click thread line to collapse