undefined | Better HN

0 pointsgeofft11y ago0 comments

The probability, assuming a properly-designed hash function, is no more likely than one in about 300 undecillion. No algorithmic flaws are known that can would let an attacker take an arbitrary input to MD5, SHA-1, SHA-2, or SHA-3 and construct a hash collision in noticeably better time. There are attacks that let an attacker construct two inputs with the same hash for MD5 and SHA-1, so out of an abundance of caution, we're considering MD5 and SHA-1 both "broken", there are now two successor hashes to SHA-1, and we're hurrying to replace all uses of MD5 and SHA-1 where collisions matter.

If you're not designing your software to worry about cosmic rays, you shouldn't design your software to worry about hash collisions. Just pick a good hash.

0 comments

3 comments · 1 top-level

Perseids11y ago· 2 in thread

I disagree with your threat model. Why do you think that the ability to create two colliding functions with basically arbitrary content does not endanger the end user. E.g. (1) I write (or copy) a useful function f1, (2) create an evil function f2, (3) manipulate the meta data of both functions to create f1' and f2' which are functionally identical to f1 and f2, (4) publish f1'. (5) People start to use f1', (6) and end user requests hash(f1') from the key-value-store, (7) I man-in-the-middle the connection and return f2' instead, (8) the end user executes f2' and is compromised.

geofftOP11y ago

First off, because you simply don't have that ability in SHA-2 or SHA-3. If you're designing a new system, don't use MD5 or SHA-1; if you're using a legacy system, organize some sort of orderly panic (just like the CA/Browser forum is doing with SHA-1 certificates).

Second, because it's not "basically arbitrary": the content almost always looks like it's been specifically designed to make room for a hash collision. For binary formats like images and PDFs, it's easy to put a large amount of unrendered data in the file format that isn't visible in a viewer, which is exactly why the newsworthy collisions have been images and PDFs. Even X.509 certificates allow you some room for arbitrary data. For program code, having a bunch of arbitrary data in the middle of the function would look extremely suspicious. (Obviously you shouldn't design the metadata format of your functions to permit enough arbitrary data, but given the talk of alpha conversions, it sounds like the proposal is to hash a canonical, high-information-density representation of the function.)

But again this doesn't come up unless you're using a broken hash. My argument is just that even the "broken" hashes really aren't very broken, so if you're using the non-broken ones, you should basically assume they're perfectly secure.

Perseids11y ago

> First off, because you simply don't have that ability in SHA-2 or SHA-3. If you're designing a new system, don't use MD5 or SHA-1;

Then don't mention MD5 and SHA1 in the first place. The sooner they leave everyone's mind as valid alternatives the better.

> For binary formats like images and PDFs, it's easy to put a large amount of unrendered data in the file format that isn't visible in a viewer

MD5 collisions have gotten much shorter in the past years [1].

> For program code, having a bunch of arbitrary data in the middle of the function would look extremely suspicious. (Obviously you shouldn't design the metadata format of your functions to permit enough arbitrary data,

And you trust the designers of these formats to know this "obvious" fact? You reference code normalization, but there was talk in this thread about keys that are to be associated with the functions to allow updates (and thus included in the hash) and I think it is perfectly valid to include graphics in the documentation of a function.

> My argument is just that even the "broken" hashes really aren't very broken, so if you're using the non-broken ones, you should basically assume they're perfectly secure.

My point is that this "structured collision resistance" is used far too often as a handwave argument why their specific protocol can continue to use a broken hash. (Remember how CAs said the same things about X.509 certificates before Appelbaum, Molnar et al. [2] presented an actual proof-of-concept?) Software developers already have difficulties to distinguish pre-image resistance from collision resistance. Giving them yet another argument to shoot themselves in the foot with is not a good idea.

[1] http://www.win.tue.nl/hashclash/SingleBlock/ and http://marc-stevens.nl/research/md5-1block-collision/

[2] http://www.win.tue.nl/hashclash/rogue-ca/

1 more reply

j / k navigate · click thread line to collapse