If we come to accept cognition as fundamentally embodied then it becomes less sensible to compare cognition across differing architectures - human cognition will always be quite unlike any other type of cognition except itself. I think machines will have consciousness (why should they not be able to, what is so special about us that would limit this phenomenon to us?) but it will be a machine consciousness and radically different from ours.
I think we're going to have to get a lot more fine-grained about how we talk about features and functions of brains whether human or machine. You've already put "smarter" in quotes which shows that already you're aware of how blunt and crude our terms are.
Does this all seem reasonable?
Can a computer be "processing" water pipes, analyzing the construction of pipes for the best flow, then jump to half-pipes and building a new half-pipe so skateboarders can flow better and produce better tricks, get more air, etc. Albeit a kind of lame example, but that jump is crucial, and something we do flawlessly. There's no hard guideline to what triggers our brains to jump. It could be audible, visual, or tangentially related to the task at hand. It could be body language of someone talking to us, that reminds us of somebody else, that reminds us of... Logical thought isn't that beautiful to me. It's predictable. Lateral thinking is though, and that's where all of the good inventions/discoveries begin anyways.
And even more OT, this reminded me that I don't recollect any "robot jokes" in any of the science fiction books I've read. Granted, there weren't that many (just the basics: Asimov, Frank Herbert, Philip K. Dick, some Stanislaw Lem), but I'm curious if any SF writer wrote "robot jokes", more exactly jokes that us, humans, think will be made by robots in the not-so-distant future.
- Computing power is getting exponentially cheaper even as computing requirements increase. The resources available to a university lab in the future will be much greater than those available today, even given the same budget. Of course this is also true for industry, but this growth is not a unique advantage of industry.
- Other scientific fields already have equipment costs that are orders of magnitude larger than CS. Physicists regularly write grant proposals for multimillion-dollar pieces of equipment. If building large clusters is necessary for academic research to stay relevant, academics will start building large clusters. The foundational work done at Bell, IBM, Xerox, etc in the 70s and 80s was not due to resource constraints in academia (academics had expensive computers too, and also did plenty of good work during that time), it was because those companies had the right combination of smart people and an immediate need to find practical solutions to difficult problems.
- Finally, and most importantly, even in the age of big data almost all fundamental research can be done quite successfully at small scales with modest hardware requirements. Notice that Hinton et. al. have spent 6+ years developing deep learning in academia, and it's only in the past couple of years that it's matured to the point of implementation at scale.
Here's the basic pipeline of most machine learning research: you come up with a new approach for training SVMs, or multilayer perceptrons, or some new type of more interesting model. First you develop your ideas conceptually, with some equations on a whiteboard. If you're a theorist, you might prove some theorems. Next you write a toy implementation in Matlab or Python to show that your method actually works, and that you get improvement over previous work for the dataset size you're using. This could mean that your method is faster -- which indicates it'll be able to scale to bigger data -- or that it's smarter / taking advantage of some new type of structure, in which case it still ought to get decent (if not state-of-the-art) results on small data. Only then, usually after publishing a few papers and working out the kinks, does it generally make sense to put in the effort to implement and test a big, efficient distributed version of your algorithm. And while that last part might be best done by industry, the first few steps are easily possible in academia and will continue to be for the foreseeable future.
Case in point: Google Translate is a massive system whose performance rests squarely on exploiting big data, in that they use the Internet as their training set. But academic machine translation research still runs quite effectively with smaller datasets on small clusters. The academics come up with ideas, implement and test them, and some ideas flop while others take off. The idea that take off get picked up by Google and implemented into Translate, where they hopefully end up pushing the envelope. So even though the academics don't have the resources to work at massive scale (which most of them don't want to do anyway -- ML researchers are usually more interested in ML than in building distributed systems) their research still has impact, through transfer to industry. This sort of relationship has been the model for academic/industry research collaboration for quite a while, and I don't think it's dead yet.
I've got no formal CS training and if I get funding for jkl.io the objective is to have (most of) a Google News (English) competitor implemented in a year, part-time. Google has thousands of ML employees but there are three million users on Github. If I need facial recognition, it's on Github. Topic modelling to layer on top of my NLP, or to aid in entity resolution, on Github. Crawlers, got it. Next gen databases (http://hyperdex.org/), got it. The jkl.io site is only just over 1000 lines of code written by me at the moment, but it probably uses tens of thousands from just the python libraries before we even talk about the DB and the OS.
The more people understand the filter bubble and the information diet concepts the more personalisation will be a thing only for side interests and friendship networks. I don't think people want black box advertising-oriented algorithms manipulating their political and economic news. The computation required for me is therefore so much smaller and cheaper. I know it's not HN's focus because people want their exit money but donation models, as Wikipedia beating Knol shows, can actually be the most efficient solution in many domains where you can't trust a corporation with a fiduciary duty to maximize shareholder profit.
People might say "but what about really huge data like location services using not just GPS, but mobile data and wifi response times, pictures from Google's new alt-reality game and street view"; they might say "Google just can't be caught up to" and point to the failure of Apple's maps. But I worked with some guys who scaled a solution using SIFT features => Lucene that could geo-locate instantly on massive datasets of images. You can prove an algorithm can scale theoretically without having 10,000 machines to run it on. One of the key points separating computer science from just programming is the analysis of algorithms in theoretical terms. Apple's failure was because they are primarily a luxury product company not an ML company but people just think "technology". Even so Apple can get stuff done, or buy companies that can (Siri). Microsoft, Yandex, Yahoo, Amazon, huge rising data powers in Asia, thousands of computer science professors, tens of thousands of post docs and doctoral students, millions of Github tinkerers are not going to fall behind. Google isn't even the major search engine in a lot of countries.
He had presented a graph comparing some models and their accuracy as the number of features was scaled up to the tens of thousands, his point being that some models that work best at smaller number of features fall off as the number is scaled up. Unfortunately the slides he has on his web page is outdated, so I haven't been able to find that reference. I'd be very happy if one of you know which paper he was referring to. In the old slides he refers to this paper, which makes something of the same point: http://ai.stanford.edu/~ang/papers/nipsdlufl10-AnalysisSingl... It shows how simple unsupervised models with dense feature extraction reach the state of the art performance of more complex models.
Of course, I can see how it makes sense to at least do some small scale prototyping, to work out kinks like you say - but the lesson is that if you are planning to do large scale machine learning you can't necessarily use the small scale tests as a good guide for large scale performance. It's certainly promising if you get very good accuracy, speed or both at small scale, though neither necessarily will carry over to large scale. On the flip side, if your method is worse than state-of-the-art at smaller scales, that doesn't mean it won't beat state-of-the-art at large scales.
How then do you decide which projects are worth trying on the large scale?
That's not what AGI is trying to do or how they are trying to do it.
But IBM is currently doing (and making good progress at) exactly what the author describes as phase 3.
that's a fascinating yet chilling thought (granted, orders of magnitudes dumber than those future thoughts of the machines)