3
I used to play 3D games on Pentium-based machines and I thought of them as a "huge upgrade" from 486, which in turn were a huge upgrade from 286, etc...
Now, people with Ice Lake CPUs in their laptops and servers complain that things are slow.
However there is definitely still less intrinsic optimisation from a dev perspective I think - people will iterate over the same array multiple times in different places rather than do it once.
I guess our industry has decided moving faster is better than running faster for a lot of stuff.
Also the pipeline length is certainly not 5 stages but more like 20-30.
> The dependency chain length is what is normally intended as instruction latency.
Yes, the way I read the original post and others was that you actually your response back in 3 cycles, which isn't correct. It doesn't get comitted for a while (but following instructions can use the result even if it hasn't been committed yet). You're not getting a result in less than 20 cycles basically.