This is worthless without actual numbers, which I doubt you have. Hardware people blame software, software people blame hardware, as it has always been, so mote it be, amen.
Let's say it is possible. That would mean current systems are about ten thousands times bigger than they could be. That's 4 orders of magnitude. And even if it isn't 4 full orders of magnitude, I'm willing to bet on 3.
It is not yet about raw speed, or latency. But when a system is at least 3 orders of magnitudes bigger than it could be, it does mean that something there vastly suboptimal. And runtime performance could very well be part of that "something".
But that's kind of a straw man. Even if you convince me that feature creep really is valuable, lack of features explains but 1 order of magnitude out of 4. There's still 3 to go. I have two explanations for those.
First, they reuse their code. A lot. When they write a compiler, all phases (parsing, AST to intermediate language, optimizations, code generation) are done with the same tool (augmented Parsing Expression Grammars, search for the OMeta language for more details). When they draw something on the screen, be it a window frame, a drawing, or text, they again use a single piece of code. Mere factorization goes a long way. Id' say it explains about 1 order of magnitude as well.
Second, their use of specialized languages yield astonishing results: they can build a self-implementing compilation system in about 1000 lines (including a bunch of optimizations). 200 more lines gets you a reasonably efficient implementation of Javascript, 200 more gets you Prolog, and a couple hundreds more can get you about any DSL you may want (external DSLs, not your average Ruby/Haskell combinator library). They implemented an equivalent of Cairo in 457 lines, which is about 100 times smaller (and quite efficient to boot, but that was a surprise bonus). They did a TCP-IP stack in about 160 lines, which again is about 100 times smaller than a typical C implementation. And they did all that with specialized languages that themselves are implemented in very little code. Based on that, I'd say their use of domain specific languages explains about 2 orders of magnitude. (Don't take my word for it. See their last progress report here: http://www.vpri.org/pdf/tr2011004_steps11.pdf )
To sum up, we could argue that current systems are about 4 orders of magnitude too big. Of the 4, 1 may be debatable (lots of features). Another (not reusing and factorizing code) is obviously something that has Gone Wrong™ (I mean, it could have been avoided if we cared about it). The remaining 2 (DSLs) are a Silver Bullet. Not enough to kill the Complexity Werewolf, but it sure makes it much less frightening. By the way, we should note that the idea of DSLs is around for quite some time. Not using them so far may count as something that has Gone Wrong as well, though I'm not sure.
Here is what John Carmack talks about his troubles with the lack of PC performance due to the multitude of APIs to reach the hardware:
John Carmack: ... That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.
Later in the article John expands on the thick software problem.
The article is here: http://pcper.com/reviews/Editorial/John-Carmack-Interview-GP...