Hash function families with a similar target usecase include: cityhash, falkhash, farmhash, FNV, meowhash, metrohash, murmur, t1ha, wyhash, xxh.
The SMHasher suite tests hash functions for speed, distribution, bias, and collisions. This function ranks well in those tests.
The Readme specifically says "you can modify it to yield 128-bits or more if you want a cryptographically secure hash."
Which is a problematic statement, because it is not designed for cryptography even if you extended the output to 256 bit. It's not the output length that makes it cryptographicaly secure. (Rather it's the difference between "you won't find collisions by accident" and "you won't find collisions even if you try really hard using very sophisticated math", but there are more requirements.)
This is similar to the issues with a different hash by the same Github user posted two weeks ago:
https://news.ycombinator.com/item?id=23103521
Making fast non-cryptographic hash functions is a fun challenge and I appreciate the projects, but please, please do not make any claims about cryptographic properties!
it's always easy to front as an expert spectator and believe others work is no good. why don't you back it up with some effort?
I believe it is a crypto hash. so..a fun challenge?
how about this for a fun challenge, why don't you post a cryptanalysis and I'll link it from the README.
Thanks for the beamsplitter link!
maybe it is designed as a crypto hash
I think SipHash is the better choice for non-cryptographic use cases (e.g. hash tables) https://131002.net/siphash/
FWIW, the SMHasher test suite takes the view [1] that defense against hash flooding attacks is a concern for the hash table's collision resolution method, which is a fair point. Nonetheless, SipHash was subsequently adopted by several programming languages' standard libraries for use in hash tables. SipHash is also notable for its clear and concise specification, including security claims, preliminary cryptanalysis, and a discussion on hash flooding [2].
[1] https://github.com/rurban/smhasher#security [2] https://eprint.iacr.org/2012/351.pdf
It is really simple and works with unaligned data.
But it is not doing well in benchmarks. I wonder if I should use another one
Looks like two interleaved 128-bit hashes, don't see any cross-half mixing. Code style needs a bit of cleanup, the sindex stuff obscures the algorithm a bit.
-Austin, SMHasher/Murmur author
I am always interested when these get attention, but I don't know enough about the implications to switch over from just using SHA.
If you're shipping a black-box component that needs to use hashing internally, and the hash outputs don't leak out of the black-box, consider switching to a faster non-cryptographic hash to gain performance. Consider the implications of switching -- dependencies, trust, performance profile, hash output size, documentation, customer expectation -- and you will have to discard or otherwise invalidate the meaning behind of your past hash outputs.
If you're occasionally applying a hash function and obtain a digest that gets put into long-lived files or records, e.g. you're checksumming your own files for sanity and then verify them later against these records, then you may value availability and stability more than you value performance. If so, don't switch.
If you are fine with the current performance profile, the cost and complexity of switching (or any nontrivial change) may outweigh the benefits of leaving everything as-is.
mix(const int A)
{
const int B = A+1;
ds[A] *= P;
ds[A] = rot(ds[A], 23);
ds[A] *= Q;
ds[B] ^= ds[A];
ds[B] *= P;
ds[B] = rot(ds[B], 23);
ds[B] *= Q;
}
with P and Q prime.Just because it's 128-bit doesn't make it cryptographically secure.
Look at meowhash and wyhash instead for the latest and greatest in that field.
function mix(uint64 a, uint64 b) {
a ^= secret
b ^= seed
hi, lo = mul128(a, b)
seed = hi ^ lo
}
It's elegantly simple, but depends critically on 'secret' not appearing in the data.What is the memory bandwidth on that instance? I ask because I'm not seeing that listed [1] but it would be a useful point of comparison. Maybe run Doctor Bandwidth's STREAM benchmark [2].
DiscoHash is included in SMHASHER [3] but its benchmark results aren't.
[1] https://cloud.google.com/compute/docs/machine-types
They are included, but understandably you missed them because it's also called BEBB4185, stated in README. Find BEBB4185 line in SMHASHER Readme.
Good question on the memory bandwidth. From memory it was a multicore system, so I think that has a higher memory bandwidth than a single core. Thanks for the STREAM thing!
http://rurban.github.io/smhasher/doc/table.html
The ecrypt result is completely irrelevant when this can not be in any way be considered cryptographically secure at this point.
for all hashes. lot of those top ones are not crypto hashes.
> The ecrypt result is completely irrelevant when this can not be in any way be considered cryptographically secure
Break it first, then you can say that. otherwise, what would you know?
not sure if it's because English is not your first language or just had a grumpy day or something else, this came across veeery negative. Be positive in your comments on work.
if you want to be critical doing so with positivity gives it more credibility because it shows you're able to see both sides. leading people to assume you're more likely making a balanced rather than a biased assessment, whether or not that's true.
There's not even an attempt at a preliminary cryptanalysis anywhere as far as I can tell. I'd advise staying far away from this for cryptography, especially if it considers a 128-bit digest size reasonable for anything but a message authentication code.
HN discussion: https://news.ycombinator.com/item?id=17659672
https://aras-p.info/img/blog/2016-08/hash2-pc.png
And FarmHash64 with SSE4.2 did almost 18 GB/s:
https://aras-p.info/img/blog/2016-08/hash2-farmhashoptions.p...
Full article: https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests...