I am finding it extremely hard to envision a circumstance where this is a bottleneck for anything. Care to clarify the context?
I also find the struct layout really odd; why not just move c?
Your benchmarks are also probably broken; branch predictors use global state so will almost certainly predict fine the way you've used things. You need to repopulate a significantly-sized array each time with randomly chosen values. You can't use the same array because it'll be learnt, and you can't use a short array because it'll be predicted globally.
For some additional context, these structs are packed wire-line protocol structs. In reality the padding bytes are full of other fun details.
For me, a wrapper includes the original thing and just wraps stuff around it. How does this work here? May also be a question to the author, I suppose now...
clang warning: flexible array members are a C99 feature [-Wc99-extensions]
is this still the case?
"Don't."
The end.
(Basically, as the story shows, with this kind of micro-optimization you may or may not beat the compiler but you're almost certainly wasting your time compared with more effective optimization methods, like rethinking the problem.)
Yet, a business case might exist when the library is heavily utilized, or often when a compiler isn't able to produce the correct code.
There are also cases primarily, in finance, where single threaded low-latency distinguishes competing groups. Some of those guys count every nanosecond.
The techniques described here ( and in other places) are universally applicable.
Try as I might, I could not beat GCC [2], which used non-vectorized code. I chalk it up to not knowing how best to write optimized x86 code anymore (it's been years since I did any real assembly language programming) and I might be hitting some scheduling or pipeline issues, I just don't know.
[1] I described the code years ago here: http://boston.conman.org/2004/06/09.2
[2] I beat clang easily though.
When that isn't the case, it is just wasting money.
> you're almost certainly wasting your time compared with more effective optimization methods
Some rare cases absolutely do exist where specific micro-optimizations such as these may be useful - I'm not arguing that they don't.
Even in those cases, though, you're far more likely to achieve significant performance gains by taking a step back and re-examining your high level goals and your approach to the problem.
I'm not saying that eliminating a branch or two is never going to be useful. I'm just saying you should focus your attention elsewhere first.