// WARNING: This code has not been tested on big-endian platforms!
// It is known to work well on little-endian platforms that have a small penalty
// for unaligned reads, such as current Intel and AMD moderate-to-high-end CPUs.
//
// By the way, for some hash functions, given strings a and b, the hash
// of a+b is easily derived from the hashes of a and b. This property
// doesn't hold for any hash functions in this file.
x86
========================
641f1cd0505d1ff9 : I
e394c6831e9d6e71 : do
651c1a4b0b3f2d88 : not
9fc671af2d051786 : know
1fa385757f0016ec : about
f2407bd9ce678f7f : making
73bd225c6b6b8163 : awkward
ea78562e35777eb1 : rhopalic
3c854a925e6e4a9e : sentences
84a04e9aa8dae9f5 : .
PPC
========================
641f1cd0505d1ff9 : I
e394c6831e9d6e71 : do
651c1a4b0b3f2d88 : not
147d90cc620fc6b2 : know
23af6044b2394703 : about
4f80227202845190 : making
4b8b4b3cd98a1de9 : awkward
61b38def2ee9465f : rhopalic
29016ee1c7c61e93 : sentences
84a04e9aa8dae9f5 : .
For anyone that cares, the test program was… #include <stdio.h>
#include <string.h>
#include "city.h"
const char *word[] = { "I", "do", "not", "know", "about",
"making", "awkward", "rhopalic",
"sentences", ".", 0 };
int main(int argc, char **argv)
{
for ( const char **p = word ; *p != 0; p++) {
uint64 v = CityHash64( *p, strlen(*p));
printf("%Lx : %s\n", v, *p);
}
} #define UNALIGNED_LOAD64(p) (*(const uint64*)(p))
#define UNALIGNED_LOAD32(p) (*(const uint32*)(p))
Any function that uses both of these violates strict aliasing rules and might miscompile on recent gcc versions. Only one of them isn't enough, because char* is allowed to alias anything in C/C++. But if you use both, then you have a uint32_t* and a uint64_t* pointing to the same memory, violating the language spec.ffmpeg has a header of macros to avoid this problem. They have names like AV_RN32A (aligned, native-endian 32-bit read), AV_RL32 (unaligned, little-endian 32-bit read) and so forth.
More specifically, any function that uses both of these to access the same data violates strict aliasing rules. But this would imply that the data is being loaded redundantly, which seems unlikely in an implementation where speed is a top priority.
For example, I do not believe the following function violates strict aliasing rules:
void foo(char *p) {
bar(UNALIAGNED_LOAD64(p), UNALIGNED_LOAD32(p+8));
}Edit: For instance, the blog post mentions being inspirsed by MurmurHash, which touts its performance on its site along with benchmarks (http://sites.google.com/site/murmurhash/):
OneAtATime - 354.163715 mb/sec
FNV - 443.668038 mb/sec
SuperFastHash - 985.335173 mb/sec
lookup3 - 988.080652 mb/sec
MurmurHash 1.0 - 1363.293480 mb/sec
MurmurHash 2.0 - 2056.885653 mb/sec
In fact, it looks like the author of MurmurHash also developed a test suite for hash functions which includes performance testing: http://code.google.com/p/smhasher/wiki/SMHasher> We decided to optimize for speed rather than simplicity and even included special cases for short inputs.
oh, to answer myself, is to help predict the branch direction - it indicates which is likely the common case in the test. http://kerneltrap.org/node/4705
or is the code written in this manner just to make it more platform independent (i.e. cycles still matter for embedded applications, etc)?
The gcc built-in simply will organize the code in a manner that makes the likely statement faster to execute (e.g. branching the unlikely block).
Branch predictors are a runtime mechanism so the processor guesses which outcome of the "if" is more likely.
One is a static optimization, the other is dynamic; if you like to think about it that way.