Got any sources for power consumption figures/comparisons of those AVX units?
I think the reason for reducing clock speed when vector units are in heavy use is to keep power usage in check.
You might also find https://blog.cloudflare.com/on-the-dangers-of-intels-frequen... helpful, which goes into detail about a specific case where dynamic frequency scaling resulted in AVX-512 code running slower than AVX2 code.
This seems an optimisation nightmare. Your program needs to be aware both of the capability of the chip for using instructions, and what type of chip it is within a family to decide if you maybe do or don't want to use certain vectored instructions.
* https://lemire.me/blog/2018/04/19/by-how-much-does-avx-512-s...
* https://lemire.me/blog/2018/08/13/the-dangers-of-avx-512-thr...
* https://lemire.me/blog/2018/08/15/the-dangers-of-avx-512-thr...
* https://lemire.me/blog/2018/08/24/trying-harder-to-make-avx-...
* https://lemire.me/blog/2018/08/25/avx-512-throttling-heavy-i...
* https://lemire.me/blog/2018/09/04/per-core-frequency-scaling...
* https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-us...