Which is a large part of what makes that attribution amusing.
"There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of non-critical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
"Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code, but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail."
The context for this statement, is of course his article [1] saying that goto statements should not be regarded as bad for religious reasons, but should be used appropriately. How common is the pragmatic approach to goto today, vs the "religious" response to goto?
So this is the context in which it's appropriate to quote Knuth on this: if you are thinking about efficiency all the time, making intuitive guesses about where your programs will be slow, using measurement tools after the fact, and in danger of spending your optimization efforts on 100% of the code instead of 3%, and if you have a rational rather than emotional response to seeing goto statements in your codebase, then Knuth's quote is for you. What he was really arguing for is a practical, rational approach rather than religious, emotional responses. Which was also Dijkstra's point about goto. The whole article is well worth a read if you haven't already seen it.
In a very real sense, this is like a lot of the rhetoric around Big O analysis. From all that you ever hear online, you would think that Knuth's analysis was focused on only the Big O descriptions of algorithms. Reading the book, however, you find much more detailed takes of individual algorithms. Such that the aim was never to not be able to do the small, but realize that the comparison is dominated by the big.
Such is it with efficiency. The assumption was that you did not pick purposely inefficient methods, at large. And that you can focus on the very small level details where you get the best return on them.
Now, I think "clean code" is getting a bit thrown under here. Picking the specific examples that were easy to talk about in a pedagogical manner is not doing any favors to the general ideas from either side. And I consider myself mostly on the anti side of the "clean code" debate.
> Programmers waste enormous amounts of time thinking about, or worrying about, the speed of non-critical parts of their programs
I don’t think this is widely true today (outside e.g. game development anyway). If it were, we’d probably have lots of fast software that’s very hard to read! But computers were a lot slower then, compilers weren’t as advanced, and it would make a lot of sense if performance was top of mind for most programmers at the time.
I think the point was that programmers' intuition about performance is wrong. So we'd wind up with a lot of slow, buggy software — buggy because the code became hard to read when it was prematurely optimized.
When you initially write your code, you won't know where the bottlenecks are. So first choose and write a sensible implementation using appropriate algorithms and data structures to complete the task.
Then, when you have something working, measure its performance against real world data, not against synthetic benchmarks. You can use synthetic data that models the real world (e.g. a large number of user records in the system) to amplify the performance issues, but collecting the data from real world data will be better.
With that performance profile, you can see exactly where the performance issues are. Those will then allow you to write more complex, or harder to read code, that improves the performance of that part of the codebase. This will then allow you to write code that actually improves performance, and not things you think will improve performance, such as:
1. The optimized inverse square root function in Doom (https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...)
2. Improving the git performance of the sha1 algorithm. (There was a discussion about rewriting the old code a while ago in assembly in the mailing lists that I can't find due to Google not understanding my search queries. In those discussions, IIRC Linus ended up creating a C implementation that compilers were able to compile into an efficient assembly version.)
Implicit, I argue, is the idea that you were also not reaching to methods that were adding inefficiencies. That is, I think the argument is fine that you can and should try to write efficient code at a high level.