In the past (roughly 10 years ego) it was a problem, as there were: MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, XOP, 3DNow and perhaps a few more extensions.
it's not a typo, there are three 'S' :)
This is usually only done for very specific algorithms. Unicode validation, hash functions, things like that. Unless you have an absolutely tiny application (which you might, if you're some kind of microcontroller), it's going to be a small percentage of your overall code size.
That is the case with GCCs __builtin functions. With a few exceptions, intrinsics are basically macros for inline asm that the compiler can reason about.
If on x86-64 you use a _mm256* intrinsic and compile without AVX support you just get a compile error, not a pair of equivalent SSE instructions.
Second, if the intrinsics aren't built-in then there may be faster alternatives than using the GCC emulated version.
1) get external string
2) figure out if it is UTF-8, UTF-16, or some other recognizable encoding
3) validate the byte stream
4) figure out if the code points in the incoming string can be represented in Latin-1
5) instantiate a java string using either the Latin-1 encoder or the UTF-16 encoder
I know some or all of these steps are done using hotspot intrinsics, and then the JIT/VM does inlining, folding and so on, but I wonder how fast a custom assembly function to do all these steps at once could be.
edit: though this is AVX2 ("AVX-256") rather than AVX-512, and Lemire has covered AVX and the possibility of throttling (with or without AVX) in the past so they're probably aware of the potential issue and consider that they either won't get triggered or the gain is good enough to compensate the lower frequency.
Got any sources for power consumption figures/comparisons of those AVX units?
A "few table names" doesn't mean much if the SQL file is 20GB.
In any case, sed and awk are plenty fast, but not the fastest methods of text manipulation. You could write a custom C program for that.
I would use Rust or C++ for this task.
See http://crashcourse.housegordon.org/coreutils-multibyte-suppo... and http://perl11.org/blog/foldcase.html for an overview of the performance problems.
This tool only does the minor task of validation of the UTF-8 encoding, nothing else. There are still the major tasks of decoding, folding and normalization to do.