undefined | Better HN

0 comments

9 comments · 3 top-level

Bartweiss9y ago· 2 in thread

Even that doesn't seem like a clear distinction?

If you ask me for the most likely new value for a dataset, I won't know. But if I graph a few things and then write a function to spit back the current mean or median, is that machine learning?

I'm not trying to be snarky there, I agree that the bulk of ML tools are fundamentally just statistical tricks with some layer of abstraction. As a result, I have a lot of trouble knowing how much abstraction justifies the ML title. I see some people using "statistics to produce unintuitive solutions" as a standard, but that just begs that we ask unintuitive to who?

BickNowstrom9y ago

I feel like it is foremost a matter of attitude of the practitioner. An applied statistician and a machine learning engineer may deliver exactly the same end product, just the reasoning and assumptions differ. Machine learning uses little to no assumptions, where statisticians do. I also feel that machine learning engineers have a bit less fear of building black boxes.

Caruana showed the cartoon of the difference between a statistician and a machine learning practitioner by showing a cliff. The statistician carefully inches to the edge, stomping her feet to see if the ground is still stable, then 10 meters before the edge she stops and draws her conclusions. The machine learning practitioner dives headfirst from the cliff, with a parachute that reads "cross-validation".

See also:

http://norvig.com/chomsky.html On Chomsky and the Two Cultures of Statistical Learning.

And http://projecteuclid.org/euclid.ss/1009213726 Statistical Modeling: The Two Cultures by Leo Breiman.

and this joke:

> Norvig teamed up with a Stanford statistician to prove that statisticians, data scientists and mathematicians think the same way. They hypothesized that, if they all received the same dataset, worked on it, and came back together, they’d find they all independently used the same techniques. So, they got a very large dataset and shared it between them.

> Norvig used the whole dataset and built a complex predictive model. The statistician took a 1% sample of the dataset, discarded the rest, and showed that the data met certain assumptions.

> The mathematician, believe it or not, didn’t even look at the dataset. Rather, he proved the characteristics of various formulas that could (in theory) be applied to the data.

jorgemfOP9y ago

> Even that doesn't seem like a clear distinction?

Obviously no, ML uses statistics as statistics uses Maths. But not all ML uses statistics, some algorithms are biological inspired (swarm optimization) other uses theory of information for classification.

The point of ML is you learn something from data, not necessarily with statistics, although it is used in a lot of algorithms. But also function optimization is used in a lot of algorithms. The boundaries are very fuzzy, but for sure not all ML uses statistics and not all statistics are ML.

marcosdumay9y ago· 2 in thread

> it means it can predict things

All the other Math areas call that kind of prediction by "interpolation". It's not a magical property that only ML has.

I'd draw the line by the name. An algorithm is ML if it includes the computer deriving a complex model based on data gathered on the field.

jorgemfOP9y ago

ML and statistics are a subset of maths. As I said the statistics and ML overlap and also function interpolation. But some ML algorithms are based on biological systems (like swarm optimization), or theory of information.

If you have a problem that you want to classify some vectors, you have different ways to do it. You call all of them ML, but some use statistics, others use interpolation, other uses theory of information, etc. The model doesn't have to be complex or require a lot of data. Instead of saying all the different techniques you sum up saying ML.

Indeed.

A lot of it went over the head because I don't know much classical statistics, but I read some articles by stats people that basically boiled down to the distinction not being in the techniques but in common assumptions, rigor, culture, etc.

collyw9y ago· 2 in thread

Predicting things seems to be the primary purpose of statistics in many cases.

jorgemfOP9y ago

I don't think so. I think it is more similar to this description: https://www.isixsigma.com/tools-templates/sampling-data/stat...

I'd say describing uncertain process and measures is. If you have a good description you might be able to predict values as well.

j / k navigate · click thread line to collapse