As we know, however, benchmarking can often come down to tuning. If this most basic of compiler options has not been set to the obvious choice for speed, how can we have any confidence that the C code as written is written in an efficient way?
Are we comparing language against language here, or somebody's implementation in one language against somebody's implementation in another?
I note that there appear to be hand optimisations in the C code. Were these done well, or would the compiler have done a better job?