The reason svn is broken is its "rep-sharing" feature, i.e. file content deduplication. It uses a SQLite database to share the representation of files based on their raw SHA1 checksum - for details see http://svn.apache.org/repos/asf/subversion/trunk/subversion/...
You can mitigate this vulnerability by setting enable-rep-sharing = false in fsfs.conf - see documentation in that file or in the source at http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...
This feature was introduced in svn 1.6 released 2009, and made more aggressive in svn 1.8 released 2013 https://subversion.apache.org/docs/release-notes/
SVN exposes the SHA1 checksum as part of its external API, but its deduplication could easily have been built on a more secure foundation. Their decision to double down on SHA1 in 2013 was foolish.
I rather believe it's a minor bug, and that once it is fixed, they can actually keep using SHA1 as before, without having the denial of service when somebody tries. Then, for example, if somebody actually tries to put two files with the same SHA1 but different MD5 they can reject the second one before accepting it. Or they if there are two different files with same SHA1 and they accepted both and they store only one content, SVN can still continue to work. So you can't get the second unless you, for example, put it in some archive format first and then put in the SVN, OK, your problem, the SVN would still work for anything else.
In short, it sounds like a denial of service at the moment, but I think that DOS can be avoided without changing the hash algorithm.
However, I'm sure that SVN is not the only source base that was never up to now tested with two different files that have the same SHA1.
Andreas Stieger (SUSE, SVN) has written a pre-commit hook script which rejects commits of shattered.io style PDFs
https://svn.apache.org/viewvc/subversion/trunk/tools/hook-sc...
This is the first mitigation available. If you are responsible for an SVN server at risk, please make use of this hook.
If somebody could make a similar hook for Windows and post it here or to dev@subversion.apache.org that would be highly appreciated.
(edit: switched script link to HTTPS)
Of course, I first tested this on our main production repository at work because...oh, wait, I didn't because what were you thinking?!
My guess is that Git wouldn't be 'hosed' like SVN, since it currently doesn't have a secondary hash to detect the corruption. It would simply restore the wrong file without noticing anything was amiss.
Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?
A bit of both. Git has an ongoing effort to replace everywhere in the source code that passes around SHA1s as fixed-size arrays of bytes with a data structure. That'll make it possible to replace the hash. But even with that work, git will still need to support reading and writing repositories in the existing format, and numerous interoperability measures.
Back then Linus shot this down in his typical abrasive fashion:
> You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected".
> That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead").
> If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles.
> And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_.
> So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too.
> I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue.
I really tried with SVN (wanted something better than CVS) for quite a long time.
I much prefer that git's designed to let me do such things and provides tools for doing so, but you can totally rewire svn repos with vi and a bunch of swearing if necessary.
(and I was using svk for a merge tool at the time so I did have the option to burn it down and rebuild from scratch; unhosing svn repos wasn't quite unpleasant enough for me to want to do so)
Then again, I started off doing more ops than dev and have also happily hand-edited mysql replication logs to unfuck things after a partial failover, so I may have more of a masochistic streak than you do :)
The bugs you can expect from software that assumed no hash collions are going to be pretty arbitrary. There was that stack overflow post about what happens with Git with collisions and it didn't seem great either, it's just that what gets hashed happens not to collide in this case.
https://lists.webkit.org/pipermail/webkit-dev/2017-February/...
> For the record: the commits have been deleted, but the SVN is still hosed.
> Subversion 1.8 avoids downloading pristine content that is already present in the cache, based on the content's SHA1 or MD5 checksum.
https://subversion.apache.org/docs/release-notes/1.8.html#pr...
I'm guessing shattered-1.pdf and shattered-2.pdf have identical hashes but distinct contents. It's not clear for me to know why this results in a "checksum mismatch."
Checksum mismatch: LayoutTests/http/tests/cache/disk-cache/resources/shattered-2.pdf
expected: 5bd9d8cabc46041579a311230539b8d1
got: ee4aa52b139d925f8d8884402b0a750c
EDIT: see https://news.ycombinator.com/item?id=13725312 for the answer $ sha1sum shattered*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf
$ md5sum shattered*
ee4aa52b139d925f8d8884402b0a750c shattered-1.pdf
5bd9d8cabc46041579a311230539b8d1 shattered-2.pdf
As you can see.