The Mixed Track Record of Machine Learning Algorithms (opens in new tab)

(bloomberg.com)

52 pointsbriatx7y ago22 comments

22 comments

14 comments · 6 top-level

pjmorris7y ago· 4 in thread

> “Machine learning algorithms will always identify a pattern, even if there is none,” he says.

I think that's perfectly said. Humans are prone to the same thing, but we've developed better coping mechanisms.

mlthoughts20187y ago

It’s a bit foolish though, because “regression” will always identify a pattern too, or many other simplistic models.

In ML, techniques to avoid overfitting or reporting spurious relationships are a first-order, 101 topic, especially among the type of ML engineer a hedge fund might hire (they are not hiring data science hacks).

On the flip side, I worked in a quant finance firm before that mostly did factor investing with some twists, and the overall statistical rigor was embarrassing. Even with simple regressions, nobody was asking basic robustness questions, p-hacking was daily life, directly comparing t-stats from different univariate model fits was considered “advanced feature selection.”

If a firm is goimg to do bad stats, they don’t need machine learning for that.

pjmorris7y ago

I guess what I'm trying to say is that algorithms can't tell whether they're fooling themselves. Someone has to apply the 101 techniques for testing fit, etc. Humans at least have the opportunity, though, as you point out, they don't always take it.

1 more reply

crunchlibrarian7y ago

Most of machine "learning" in real world use seems to be humans fiddling with weights until they get the magic number they want.

0-_-07y ago

That sentence is not true in the presence of good regularization. And doing proper regularization is a major part of machine learning as a discipline, just like having test and training sets to see if you overfit the data. So “Machine learning algorithms will always identify a pattern, even if there is none” is only true if you don't follow the basic best practices of machine learning.

geebee7y ago· 4 in thread

“Machine learning algorithms will always identify a pattern, even if there is none”

Is this partly a problem with interpretation? Let's say I do a binary (supervised) classification with an algorithm that is also capable of assessing probabilities. If I generate a data set consisting of a randomized bag of words, and randomly assign them to 0 and 1 categories, and run it through a supervised ML classifier, then yeah, everything in the test set will get assigned to something.

But if you look at the probability estimates resulting from the ML, you'd almost certainly see something that indicates a high degree of randomness in the assignments (various techniques such as cross validation, or probabilities that indicate a high degree of uncertainty for almost all of the predictions).

I'm not sure this is a problem with the algorithm itself, because the output from many of these algorithms does indicate low predictive value.

sdenton47y ago

Check out this classic paper on deep learning with randomized labels: https://arxiv.org/abs/1611.03530

Spoiler: the neutral net thinks it's doing a really good job!

geebee7y ago

Thank you for the link. I'll read this paper. I'm hoping to reply but the thread may be stale by the time I do. Right now, my thoughts are: if it is easily fitting, what are the assignment probabilities? Are we getting 90%+, or is it fitting easily, but to much lower probabilities. Also, is there a big difference between neural nets and other algorithms like RF?

The paper certainly does appear to address the question of categorizing completely randomized input:

From the abstract

"...our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a ran- dom labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by com- pletely unstructured random noise.

1 more reply

yters7y ago

That's because ML is given a model to fit to the data, so it'll find the best fit, even if the model doesn't represent the data.

geebee7y ago

Well, right. A binary classification system must assign a 0 or 1. But cross validation or other methods may reveal that it isn't a good fit, just a fit.

As mlthoughts pointed out in a different comment, any kind of regression technique faces issues about goodness of fit. The thing is, there are techniques to show you that the fit isn't very good. A simple linear regression will fit randomized noise, but there are outputs that can show you that the fit isn't good and the regression may not be reliable.

The question I have here is whether ML techniques are failing in a different way, that it is fitting to randomized noise while appearing by various tests to be a very strong fit. If they're failing the same way that regression would (i.e.., someone applies it and fails to do basic tests for goodness of fit), that's a problem I suppose, but is it really a unique failing of ML or neural nets? It sounds like more like a standard misapplication of predictive modeling...

screye7y ago

A good ML system has to be architect-ed to exploit known structure of the task that is being attempted.

CNN's exploit spatial locality and LSTMs exploit temporal locality. The SOTA models are architect-ed with even stronger assumptions about the nature of the task. Methods like Neural Networks, Random Forests and SVMs when used as unconstrained universal function approximators for unstructured data only learn some non-linear polynomial/ exponential/logarithmic combination of data itself, without much nuance.

It is critical to help a model out by constraining the space of models it searches over to find the right answer. I think, unless we figure out a way to constrain architectures to exploit specific traits of task they are trying to solve, (universal function approximator type) ML won't succeed in the same way that it has in vision / language.

As it of now, the alternative is to use PGMs where the model is fully interpretable as a graph structured combination of explicitly parameterized random variables. PGMs work well with low data and give really good uncertainty estimates, to evaluate the quality of a model. PGMs of course suffer from the problem where they are excruciatingly slow for large datasets and require require a decent amount of prior knowledge about the problem to explicitly define the type of graph structure / random variables we are going to be using.

I think ML is most certainly capable of solving this problem, but the community is probably waiting for another break through along the lines of AlexNet/LSTMs before that it the case.

1 more reply

bitL7y ago

Most of those replies sound like: "we are experts, we know better" and stating badly outdated facts that probably passed through marketing department on the way to those experts. I hope no progressive person wants to work for them, but rather to compete with them to drive them out of business, a typical pattern repeating whenever somebody gets too cocky about their abilities.

pighive7y ago

Why are we seeing so many stories from bloomberg.com? What tier is bloomberg’s credibility and their sources’? What part of technology, or business is their journalism known for? I am new to US, but I think my questions are not senseless. Thanks.

fathead_glacier7y ago

I find this article moot. They are basically saying that applying a blank ML stamp can yield unsatisfying results on your data. That ought to be obvious and seems unnecessarily repetitive.

j / k navigate · click thread line to collapse

22 comments

14 comments · 6 top-level

pjmorris7y ago· 4 in thread

> “Machine learning algorithms will always identify a pattern, even if there is none,” he says.

I think that's perfectly said. Humans are prone to the same thing, but we've developed better coping mechanisms.

mlthoughts20187y ago

It’s a bit foolish though, because “regression” will always identify a pattern too, or many other simplistic models.

If a firm is goimg to do bad stats, they don’t need machine learning for that.

pjmorris7y ago

1 more reply

crunchlibrarian7y ago

Most of machine "learning" in real world use seems to be humans fiddling with weights until they get the magic number they want.

0-_-07y ago

geebee7y ago· 4 in thread

“Machine learning algorithms will always identify a pattern, even if there is none”

I'm not sure this is a problem with the algorithm itself, because the output from many of these algorithms does indicate low predictive value.

sdenton47y ago

Check out this classic paper on deep learning with randomized labels: https://arxiv.org/abs/1611.03530

Spoiler: the neutral net thinks it's doing a really good job!

geebee7y ago

The paper certainly does appear to address the question of categorizing completely randomized input:

From the abstract

1 more reply

yters7y ago

That's because ML is given a model to fit to the data, so it'll find the best fit, even if the model doesn't represent the data.

geebee7y ago

Well, right. A binary classification system must assign a 0 or 1. But cross validation or other methods may reveal that it isn't a good fit, just a fit.

screye7y ago

A good ML system has to be architect-ed to exploit known structure of the task that is being attempted.

I think ML is most certainly capable of solving this problem, but the community is probably waiting for another break through along the lines of AlexNet/LSTMs before that it the case.

1 more reply

bitL7y ago

pighive7y ago

fathead_glacier7y ago

I find this article moot. They are basically saying that applying a blank ML stamp can yield unsatisfying results on your data. That ought to be obvious and seems unnecessarily repetitive.

j / k navigate · click thread line to collapse