Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?
A bit of both. Git has an ongoing effort to replace everywhere in the source code that passes around SHA1s as fixed-size arrays of bytes with a data structure. That'll make it possible to replace the hash. But even with that work, git will still need to support reading and writing repositories in the existing format, and numerous interoperability measures.
Back then Linus shot this down in his typical abrasive fashion:
> It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want.
The key phrase being "looks halfway sane". Git doesn't just look at the hash. It looks at the object structure too (headers) and that makes it highly resistant to weaknesses in the crypto alone. His point essentially is you should design to expect crypto/hash vulnerabilities, and that's a smart stance, as they are discovered every few years.
Linus was not talking about the object headers, but about the object contents. It's harder to make the colliding objects look like sane C code, without some strange noise in the middle (which wouldn't be accepted by the project maintainers).
Yes, it's a "C project"-centric view, but consider the date: it was the early days of git. The main way of receiving changes was emailed patches, not pull requests. Binary junk would have a hard time getting in. And even if it did get in, the earliest copy of the object wins, as long as the maintainers added "--ignore-existing" to the rsync command in their pull scripts (yeah, this thread seems to be from before the git fetch protocol), as mentioned earlier in the thread.
> You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected".
> That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead").
> If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles.
> And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_.
> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too.
> I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue.
I don't mean this to say that you are being inaccurate, just that his current position seems a little different now:
"Again, I'm not arguing that people shouldn't work on extending git to a new (and bigger) hash. I think that's a no-brainer, and we do want to have a path to eventually move towards SHA3-256 or whatever" http://marc.info/?l=git&m=148787457024610&w=2
I was just answering the question "Why hasn't Git switched...People have been warning that SHA-1 is vulnerable for over a decade"
Linus' 12-year-old opinions are the relevant thing for why it hadn't changed. A decade from now, things may be different.
> Do we want to migrate to another hash? Yes.