I skimmed the paper, and as far as I understood it:
Most cryptographic hash functions operate in fixed-size blocks (for instance, 32 bytes). Additionally, most cryptographic hash function implementations are designed to be streaming, that is, they do not receive the whole input at once. If you give them a partial input which is not a multiple of their block size, these implementations have to buffer the partial input, so that it can be combined with the next partial input (or flushed, if the next call is to finish the computation and generate the output). The arithmetic overflow which leads to the buffer overflow happens when computing how much it has to buffer, given a partial input and any previous partial input already in its internal buffer.
That is, having a file larger than 4GiB is not enough; it has to also be cut into pieces which are not a multiple of the block size (which is normally a power of two). Most users of a cryptographic hash function will either give it the input in large power-of-two pieces (for instance, 8 KiB or 64 KiB), or give it the input all at once, and thus will not hit the bug.
buffer overflows or integer UB's and overflows are very common. ubsan, asan, valgrind tests are missing. some do offer symbolic verification of the algo, but not the implementations.
See my https://github.com/rurban/smhasher#crypto paragraph, and "Finding Bugs in Cryptographic Hash Function Implementations", Nicky Mouha, Mohammad S Raunak, D. Richard Kuhn, and Raghu Kacker, 2017. https://eprint.iacr.org/2017/891.pdf
https://news.ycombinator.com/item?id=31089216 ... and it's not like there wasn't a FOSS test suite for this.
The paper makes no mention of compiler warnings… but shouldn’t this cast trigger a compiler warning?
Basically, this is why you shouldn't use unsigned types unless you explicitly want them to overflow.
It also has lots of annoying warnings that would dissuade many people from running -Weverything by default.
This is such a critical part of the software stack, that we need a more reliable way of validation than just a bunch of people staring at the code written in C.
A formal verification implementation would catch it at authoring time, yes.
I just can’t get my head round the idea that software written and reviewed by experts and submitted to the “National Institute of Standards and Technology” with a budget of 1 billion dollars can fuck up this way.
I’m no mathematician but I would have thought implementing pure number crunching code is not rocket science.
Buffer overflow, overwrite memory, run arbitrary code, seriously? LOL, WTF.
Even with memory safe languages, there are dangers. Humanity just hasn't figured out how to produce completely bug-free code at the scale we need in general, let alone in a memory-unsafe language.
partialBlock = (unsigned int)(dataByteLen - i);
Where both `dataByteLen` and `i` where actually `size_t`.Assuming this is close enough to C, what happens is that we're converting a difference between `size_t` into a mere `unsigned`, and since they're not the same sizes on 64-bit platforms this can give `partialBlock` the wrong value, and the whole thing then snowballs into a catastrophic error that is not trivial to test because it only happens with huge buffer sizes.
The biggest mistake here is having written `(unsigned int)` instead of `(size_t)`. But the reason it happened in the first place is because they tried to do the right thing: writing the cast as a precaution, even though the following would have worked:
partialBlock = dataByteLen - i;
I really can't fault them: because it was a difference it could theoretically yield a "negative" result, and therefore intuitively the type of a difference should be signed, so we should cast it back to unsigned to be crystal clear. I knew C was dangerous, but to be honest I didn't expect such a wicked mind game.Now I'm going to have to take a look at my code.
Shipping a non-trivial program which has more than a few thousand lines of code borders on impossible.
It's true that Snowden revealed the NSA had their fingers in NIST's cryptographic standards team, with Dual-EC a specific example that is considered suspicious since the revelations. So a suspicion of malice is not completely unfounded, but as far as I can tell, this code was written by the Keccak team not NIST itself, and for any claim of "they're NSA stooges", I would need evidence.
It also seems to me that the Keccak team are not stupid people. That leaves "honest mistake" as the most likely explanation.
There are lots of studies on human behaviour that show a competent and diligent human will mess up every Nth time they perform a routine task; some forms of probabilistic risk analysis take N = 10^3 as a lower bound for this.
There is a long history of railroad operation in the UK (who invented the steam train after all) where occasionally, a signaller would send an express onto a line where they'd forgotten that the local was standing, or two trains head-on down a single line in opposite directions. This led to the development of interlockings and token working systems as technological solutions to mitigate the risks of human error, and later on to today's computerised safety systems, because signallers are (almost always) neither stupid nor malicious, but still human. The same can be said for programmers.
(As I understand, the recent train disaster in Greece was on a line where there should have been interlocking in place, but it wasn't active.)
Do you think everything (arguably anything) is released flawless?