We know Intel has that problem with AVX-512. You can get a lot of throughput per cycle with those instructions but the cost is they cause the processor to run hot and have to downclock. It's possible (and really expected) that the same thing happens to some extent at unusually high IPC. Getting 15% higher IPC doesn't really buy you anything if the processor has to lower its clock speed by 15% to execute that type of code.