> I want to emphasize that historically, from the very first moment somebody thought of computers, there has been a notion of: “Oh, can the computer talk to me, can it learn to love?” And somebody, some yahoo, will be like, “Oh absolutely!” And then a bunch of people will put money into it, and then they'll be disappointed.
Reminds me of a pre-transistor computing quote from Charles Babbage, about some overeager British politicians:
> On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
I remember hearing from some old salty in the oil business that geologists are the wrong people to ask about peak oil. They always understimated future discoveries. The ones that tended to get it right were finaciers and investors.
The idea is that geologists have their noses down in the details of practical, useful knowledge that they have or can get. Financiers don't really know anything, just that wells have been found in the past. They just model things like exploration money, the rate and quality of new finds, oil prices, production costs...
There could be somethng similar here. The real technology people see mostly problems. All the stuff that would need to be solved, that they have no idea how to solve. The fact that we don't even know what intelligence is. The frauds making audacious claims.
Outsiders see drones, self driving cars, spam filters, google search, chess, face recognition, translation, chatbots^. They see that voice recognition now works. I reckon medical diagnosis might do something soon. In any case, it seems that pone way or another, these add up to something. ...just as a hunch.
Obviously I don't know the answer and this whole comment is based on an anecdote that may not even be true. Still, I don't discount the possibility that the unwashed masses are right.
^just kidding
The term green lumber refers to a story by authors Jim Paul and Brendan Moynihan in their book What I Learned Losing A Million Dollars, where a trader made a fortune trading lumber he thought was literally "green" rather than fresh cut.[26] "This gets at the idea that a supposed understanding of an investment rationale, a narrative or a theoretical model is unhelpful in practical trading."[27]
The protagonist makes a big discovery. He remarks that a fellow named Joe Siegel, one of the most successful traders in a commodity called "green lumber," actually thought that it was lumber painted green (rather than freshly cut lumber, called green because it had not been dried). And he made it his profession to trade the stuff! Meanwhile the narrator was into grand intellectual theories and narratives of what caused the price of commodities to move, and went bust. It is not just that the successful expert on lumber was ignorant of central matters like the designation "green." He also knew things about lumber that nonexperts think are unimportant. People we call ignorant might not be ignorant. The fact is that predicting the order flow in lumber and the usual narrative had little to do with the details one would assume from the outside are important. People who do things in the field are not subjected to a set exam; they are selected in the most nonnarrative manner—nice arguments don’t make much difference.[25]
1. the Fifth Generation Project (https://en.wikipedia.org/wiki/Fifth_generation_computer) was 1980s officially ending in 1992, not 'late 1990s' (during the Dot-com bubble?!); 2. the Lisp bubble didn't pop because of a failed DoD piloting project, it popped because of the first AI Winter + commodity SPARC/x86 pressure + recession (https://en.wikipedia.org/wiki/Lisp_machine) (and I don't recall DARPA instituting any policy like 'no AI', just stopping subsidizing Symbolics and later Connection Machine); 3. the Club of Rome report couldn't've killed its modeling language because it only really acquired its present ill repute by the 1990s, the implementation language Modelica (https://en.wikipedia.org/wiki/Modelica) didn't die (last release: April 2017) and is still in industrial use which is more than almost all languages from the 1960s-1970s can say, and even the World3 model (https://en.wikipedia.org/wiki/World3) analyzed in the report continued development for decades; 4. the Oxford paper (https://www.fhi.ox.ac.uk/wp-content/uploads/The-Future-of-Em...) doesn't make precise forecasts for when any automation may happen (merely saying "associated occupations are potentially automatable over some unspecified number of years, perhaps a decade or two"); 5. the GPU server comparison is really weird as computers have almost always cost more than humans and only relatively recently do any computers' hourly costs fall below minimum wage; and 6. the Dartmouth description is wrong, the conference merely proposed (http://www-formal.stanford.edu/jmc/history/dartmouth/dartmou...) that meaningful progress could be made by 10 researchers, not grad students ("We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College...We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.")
Also, come on dude, Keras isn't hard to use - it's not even comparable to Tensorflow. But at least he didn't tell the tank story.
Despite all that a great antidote to the overhype that I see most days.
Luckily math has developed methods such as error-detecting/error-correcting codes (to insure against small typos/transmission errors), constructive results on continuity and robustness of functions (i.e. we can prove that if the error in the input data is less than some concretely computable delta, the solution will have an error less than epsilon; or we can ensure that the error in the solution is less than some computable epsilon if we can ensure that the error in the input data "is not too large" (i.e. bounded by some computable epsilon) etc.
In this sense I don't consider the question as that absurd.
People using Babbage's machine would have entered raw information into that thing. No error correcting code would correct the human induced flaws in that. So the question was absurd at the time.
All these solutions are good for a noisy input, but have no use when the input is incorrect (ie. doesn't match reality).
A colleague of mine called these "educated incapacities" - where we become acutely aware of impossibilities and lose sight of possibilities. Andrej Karpathy, in one of his interviews iirc, said something like "if you ask folks in nonlinear optimization, they'll tell you that DL is not possible".
It is useful to keep that innocence alive despite being educated, especially if the cost to trying something out doesn't involve radical health risks. That plus a balance with scholarship.
Knowledge, courage and the means to execute are all needed.
I sincerely doubt anyone who knows more than one sentence about deep learning would say that, since deep learning doesn't claim to optimize.
there's a small cottage industry of papers (like [0]) that try to explain this.
Regarding deep NNs, one should be careful with what one wishes for, because sometimes they come true. Landing up with the global optimum of that thing would likely be the last thing one wants.
The key to deep NNs is to do such a pathetic job of optimizing the loss that the generalization is good. A problem is that there several different ways of doing a job poorly, not all of them would generalize well. When I have my engineer hat on, I would rather not have lots of indeterminism on my watch if I can afford it. Too dang hard to maintain correctness of.
On the other hand if one has a "with high probability" style result where the probabilities are high enough to be practically relevant, then we have something more workable.
As I see more of the failure modes of deep learning, a lot of successes and mistakes made by humans start to become more understandable. Machines don't need to be perfect or avoid failures; like humans, they need to work most of the time and then be used in systems that are tolerant of their potential faults and mistakes.
The whole interview was an absolute joy to read.
Minimum wage (or thereabouts $7.20) now gets you a whopping p2.8xlarge (8 GPU, 32 vcpus, 488GB RAM), and the single GPU machine p2.xlarge is now $0.9 per hour.
This is a crazy data point. What will minimum wage buy you five years from now?
That's completely over-simplifying matters. Data scientists also drink soy lattes and ride children's push scooters.
As another person who's seen robots fall over again and again and has a scope for the difficulty of the problem, I'd say there's also the risk of the day to day failures making us lose sight of the forest for the trees, with availability bias working against us.
Also,
> the Y Combinator autistic Stanford guy thing
> the Aspy worldview
It's a bit worrying that use of these terms has turned into a kind of slur, to lump a kind of imagined stunted-worldview with a medical diagnosis. Not particularly pissed that this guy used these, more worried about what it indicates - that these have become so common as to infiltrate friendly informal conversations from seemingly intelligent people.
I wish I saw more than one or two of these a year.
It’s shocking to me how much technical people buy into this, how “this time it’s different” and AI isn’t “over-promising and substantially under-delivering” this time. Really odd to watch it come round again, when the reality is we’re more likely to see some near incremental progresses, partly fueled by more compute and algorithmic advances. Partly by a lot of PR.
Also, we judge the difficulty of things by our own experience. It took us ~1 billion years to get to the point where we could communicate abstract ideas and play chess. These were once believed to be the challenging problems in AI.
It turned out that chess is easy we're just relatively bad at it.
What's yet to be seen is if startups can profit from this advance, since it depends on massive data and compute.
Technically you could do a lot of the decision making it'll be doing with human made models and a lot of data, but the machine is cheaper and it's backed by consulting agencies.
RPA was the first indication. It's basically screenscraping and small bots, stuff that's been around for a long time, I mean, it's basically what people use to bot in video games. Yet it's become a multimillion dollar industry over the course of a few years because it caught the right drift.
Like RPA, machine learning isn't just hype. It actually does some things with data really well, and when you couple that with the fact that ministers want this tech, well, that's all you need.
Anyway, overall great article, but this was the one thing that bothered me enough to comment.
> It’s really bad to use. There’s so much hype around it, but the number of people who are actually using it to build real things that make a difference is probably very low.
I wonder how many data scientists out there are actually developing Tensorflow models for a mission-critical project at work. I'm not. I have used Tensorflow successfully within my personal projects, but I've yet to need it for anything "real."
That service offering among the rest of the business was marginal so it never became an offering that our sales team pitched our customers very aggressively, so in this particular case TensorFlow did not push the needle so-to-speak.
It is probably quite far from a standard usage, but Tensorflow may be used to write some custom graphical models inference, for example. To be practical these algorithms can not be implemented in, say, pure python.
The point is Tensorflow gets you pretty close to assembly level computation. The alternative is to write in, say, cython - which is much more time consuming to write, and does not give you parallelization for free. Another alternative I guess would be torch, but that is the same as tensorflow the way I see it.
> And even up to last year, there’s just massive bugs in the machine learning libraries that come bundled with Spark. It’s so bizarre, because you go to Caltrain, and there’s a giant banner showing a cool-looking data scientist peering at computers in some cool ways, advertising Spark, which is a platform that in my day job I know is just barely usable at best, or at worst, actively misleading.
By way of anecdote, Spark's MLlib used to contain an implementation of word2vec that failed when used on more than 2 billion words (some arcane integer overflow). So much for scale!
As for performance, in 2016, the break-even point where a Spark cluster started being competitive with a single-machine implementation was around 12 Spark machines (a bit of a hindrance to rapid iterative development, which is the corner stone of R&D): https://radimrehurek.com/florence15.pdf
I think most people don't have big data (Amazon has an x1 with 4 TB of RAM, after all!) but there's no shame in that. I'll use a big machine for grid search or other embarrassingly parallelizable stuff, but I can confirm that Spark is usually a bad tool for actual ML unless you use one of their out-of-the-box algos. Even then, tuning the cluster on EMR with YARN is a pain, especially for pyspark. There's a gap, I think, between the inflated expectations of "I'm going to get general AI in 5 years and CHANGE THE WORLD" and "this K-means clustering will be a good way to explore our reviews", but somewhere in the middle there is actual value.
(I also hate that "AI" is becoming the new hype-train; I don't consider anything of what I do to be "AI", but you have people calling CNNs or even non-deep-learning models "AI"). This is only going to result in inflated expectations- DS practitioners have to communicate the value without hype, and also find a way to weed out charlatans.
I think much of the negativity towards DS from the programming community is because the Data Scientist is what the programmer used to be ~15 years ago. It's that nerdy thing for a select group of very smart people, whereas being a software developer/engineer/architect/whatever has become just another common job (at least outside of Silicon Valley).
Also, from my experience as the lone developer taking the first steps to implement machine learning techniques in my company - lots of developers also think DS/ML is a cool thing with value, but they simply, absolutely don't understand it (and don't want to put in the effort to learn). These techniques are not hard and not magic, but they require a completely different way to think about problems than "traditional" programming does. I've seen developers up and down the hierarchical ladder struggle with wrapping their heads around these concepts, and it's way easier to dismiss it all as "hype" instead of accepting the fact that these techniques will be a huge part of what software development will look like in the future.
But, all those things people did in the '90's or even earlier. It was called "data warehousing" or "decision support" back then. The fundamental techniques - linear regression, logistic regression, k-mean clustering - go back even earlier, to the OR community post-WW2. Banks have been doing credit scoring with these techniques for a loooong time. The manufacturing industry has been using these techniques for even longer. Engineering for even longer than that.
So you can see why people are quite cynical about the way old, established techniques are being presented as the hot new thing - and you can see why people who have been doing this stuff for 20+ years might be annoyed at 20-somethings who claim to have invented this new thing. What's wrong with someone calling themselves a "statistician" or an "applied mathematician"?
But this is by no means purely a DS thing, seems noone is a programmer anymore either, they're all "senior certified enterprise solution architects" or some grandiose thing.
I would say data warehousing is more concerned with things like OLAP, Star Schema, ETL, etc. than what people are calling 'data science' right now. The same thing with 'decision support', since data warehousing grew out of decision support systems. The most overlap here is with 'data mining' algorithms like association rules clustering.
> The fundamental techniques - linear regression, logistic regression, k-mean clustering - go back even earlier, to the OR community post-WW2.
Here I think you've got a stronger argument. OR has a long, proud history of using applied math for business objectives. But again, I would say most of OR deals with different problems and different techniques - it's more about prescriptive analytics, constrained optimization, linear programming, simulations, etc. than the type of predictive modeling in most data science.
I see data science as a separate field even though it's stitched together from a bunch of others. It's certainly not entirely new, and certainly overhyped in some annoyingly-breathless news reports. I could say the same thing about CS - was it entirely "new" when it started as a discipline? Isn't CS "just" applied math?
To be fair, few of the "senior architects" I've worked with in big companies knew how to program very well.
There's two groups:
- People who are overly enthusiastic about neural nets
- People who are cynically calling every ML algorithm "AI", up to and including linear regression
and I'm more annoyed at the last one.
[1]: https://projecteuclid.org/download/pdf_1/euclid.aoms/1177704...
I personally think that the main reason why general AI may be very far away is because there is little incentive today for working on it. Specialized AI seemss good enough to drive cars. Specialized AI should be good enough to put objects in boxes, cut vegetables and flip burgers and so on, and the economical impact of building that is much greater than the economical impact of making a robot that barely passes the turing test and that's otherwise fairly dumb or ethically unbounded.
This isn't really true, since this can be said of any ML model. ML is nothing new. Deep learning is new. It works because we have so much data that we can start to extract complex, nonlinear patterns.
Brilliant.
> Because the frightening thing is that even if you remove those specific variables, if the signal is there, you're going to find correlates with it all the time, and you either need to have a regulator that says, “You can use these variables, you can't use these variables,” or, > I don't know, we need to change the law. As a data scientist I would prefer if that did not come out in the data. I think it's a question of how we deal with it. But I feel sensitive toward the machines, because we're telling them to optimize, and that's what they’re coming up with."
So is he saying that he is worried optimisation throws results that are not what he would like to see?
You're looking to pick the fastest runners out of a group of people. You run an optimization algorithm to pick out the fastest in that group. Nothing about this optimization accounts for the fact that 1/3 of the people in the group have been being shot in the foot with a gun prior to your optimization. The data will show that they are poor runners without addressing the crime previously committed. In fact, many people would consider it a second act of crime.
We know that no manager got fired for choosing Java.
There is a researcher's version of that. No researcher got fired for making a neural network more 'convoluted'. It helps if there exists one dataset where it does 0.3% better. Doesn't matter if that data set is(has been since the late 90s) standard fare as a homework problem in machine learning course.
That said we do understand these things a bit better than before. Some concrete math is indeed coming out.
I personally don’t like the phrase data scientist but I get it and I get why it’s science as opposed to engineering. I personally like the split between machine learning, BI, and data engineering.