Absolutely, yes. Or the optimization landscape suddenly changes. I had a super-cool trick for doing state transitions in a DFA at the rate of the throughput of a shuffle instruction rather than at the rate of the latency of a shuffle instruction, and smugly congratulated myself about how well it worked on Ivy Bridge (latency = 1, reciprocal throughput = 0.5). Then Haswell came along and took over the second shuffle capability to do 256-bit shuffles (latency = 1, throughput = 1). So the clever trick went obsolete overnight. :-(