In the past (roughly 10 years ego) it was a problem, as there were: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, XOP, 3DNow and perhaps a few more extensions.
it's not a typo, there are three 'S' :)This is usually only done for very specific algorithms. Unicode validation, hash functions, things like that. Unless you have an absolutely tiny application (which you might, if you're some kind of microcontroller), it's going to be a small percentage of your overall code size.
That is the case with GCCs __builtin functions. With a few exceptions, intrinsics are basically macros for inline asm that the compiler can reason about.
If on x86-64 you use a _mm256* intrinsic and compile without AVX support you just get a compile error, not a pair of equivalent SSE instructions.
Second, if the intrinsics aren't built-in then there may be faster alternatives than using the GCC emulated version.
1) get external string
2) figure out if it is UTF-8, UTF-16, or some other recognizable encoding
3) validate the byte stream
4) figure out if the code points in the incoming string can be represented in Latin-1
5) instantiate a java string using either the Latin-1 encoder or the UTF-16 encoder
I know some or all of these steps are done using hotspot intrinsics, and then the JIT/VM does inlining, folding and so on, but I wonder how fast a custom assembly function to do all these steps at once could be.
edit: though this is AVX2 ("AVX-256") rather than AVX-512, and Lemire has covered AVX and the possibility of throttling (with or without AVX) in the past so they're probably aware of the potential issue and consider that they either won't get triggered or the gain is good enough to compensate the lower frequency.
Got any sources for power consumption figures/comparisons of those AVX units?
I think the reason for reducing clock speed when vector units are in heavy use is to keep power usage in check.
You might also find https://blog.cloudflare.com/on-the-dangers-of-intels-frequen... helpful, which goes into detail about a specific case where dynamic frequency scaling resulted in AVX-512 code running slower than AVX2 code.
A "few table names" doesn't mean much if the SQL file is 20GB.
In any case, sed and awk are plenty fast, but not the fastest methods of text manipulation. You could write a custom C program for that.
I would use Rust or C++ for this task.
See http://crashcourse.housegordon.org/coreutils-multibyte-suppo... and http://perl11.org/blog/foldcase.html for an overview of the performance problems.
This tool only does the minor task of validation of the UTF-8 encoding, nothing else. There are still the major tasks of decoding, folding and normalization to do.
Substitution should not be really a relevant metric as it wouldn't influence the result much. Sed/Awk will still have to go through the whole file to find all occurrences they should substitute (and when they do find an occurrence, the substitution would take nanoseconds).
The size of the file is a better metric (e.g. how many seconds for that 800mb in total).
Also, whether you used regex in your awk/sed, and what kind. A badly written regex can slow down search very much.
Did you set LANG=C before running sed, to bypass the UTF-8 logic?
Also, if you had a list of substitutions to perform, did you try writing them as a single sed script?