You need a Gold 6000 series and above to see any benefit from AVX512. In most other cases the CPU throttles down some insane amount and there’s no to little benefit.
Did you guys get to test Epyc at CloudFlare?
The 7401P seems pretty special. Like really great $ per perf. I think SuperMicro are coming out with 1 socket Epyc boards/servers.
The terminology in this context is already fast and loose: It is rigorous in a practical engineering sense and is far from a mathematical level of precision. As I pointed out above, the maintainers could just define Go to include a few Assembly languages.
He doesn't like the title of the OP and provided links:
> Very misleading title. Could just as well name it "accelerate sha256 up to 134x". You need to compare apples to apples. If AVX2 was used in the same way AVX512 is used, the speedup would be 2X at most. Reminds me of two of my papers https://eprint.iacr.org/2012/371.pdf https://eprint.iacr.org/2012/067.pdf
(from https://twitter.com/thecomp1ler/status/940724783804645376)
EDIT: Thanks 'delhanty !
Most of the Gold and Platinum series chips don't start frequency scaling down below baseline until around half the cores are using AVX512. The fanciest Platinum chips can use it on all cores with the only limit being that you can't Turbo quite as much: https://en.wikichip.org/wiki/intel/xeon_platinum/8180m
Without that capability, cloud providers wouldn't be able to offer multitenant VMs with access to the new instructions
Intel Cannon Lake processors will support the SHA instruction extensions (currently available only on Goldmont). It will be interesting to see how that compares with this approach of running 16 SHA computations in parallel. You would be able to get rid of the scheduling overhead of having to first queue up 16 SHA calculations from other threads.
They're also already available on AMD Zen (Ryzen, Threadripper, Epyc, Ryzen Mobile).
Well, if you're going to dip into pedantic mode, couldn't the language maintainers just define Go to include a few relevant Assembly instruction sets? (Not taking a dig at you but rather at the above level of pendantry.)
When C programmers write inline assembly, they don't pretend it's C code.