The reason svn is broken is its "rep-sharing" feature, i.e. file content deduplication. It uses a SQLite database to share the representation of files based on their raw SHA1 checksum - for details see http://svn.apache.org/repos/asf/subversion/trunk/subversion/...
You can mitigate this vulnerability by setting enable-rep-sharing = false in fsfs.conf - see documentation in that file or in the source at http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...
This feature was introduced in svn 1.6 released 2009, and made more aggressive in svn 1.8 released 2013 https://subversion.apache.org/docs/release-notes/
SVN exposes the SHA1 checksum as part of its external API, but its deduplication could easily have been built on a more secure foundation. Their decision to double down on SHA1 in 2013 was foolish.
I rather believe it's a minor bug, and that once it is fixed, they can actually keep using SHA1 as before, without having the denial of service when somebody tries. Then, for example, if somebody actually tries to put two files with the same SHA1 but different MD5 they can reject the second one before accepting it. Or they if there are two different files with same SHA1 and they accepted both and they store only one content, SVN can still continue to work. So you can't get the second unless you, for example, put it in some archive format first and then put in the SVN, OK, your problem, the SVN would still work for anything else.
In short, it sounds like a denial of service at the moment, but I think that DOS can be avoided without changing the hash algorithm.
However, I'm sure that SVN is not the only source base that was never up to now tested with two different files that have the same SHA1.
Andreas Stieger (SUSE, SVN) has written a pre-commit hook script which rejects commits of shattered.io style PDFs
https://svn.apache.org/viewvc/subversion/trunk/tools/hook-sc...
This is the first mitigation available. If you are responsible for an SVN server at risk, please make use of this hook.
If somebody could make a similar hook for Windows and post it here or to dev@subversion.apache.org that would be highly appreciated.
(edit: switched script link to HTTPS)
Of course, I first tested this on our main production repository at work because...oh, wait, I didn't because what were you thinking?!
My guess is that Git wouldn't be 'hosed' like SVN, since it currently doesn't have a secondary hash to detect the corruption. It would simply restore the wrong file without noticing anything was amiss.
Why hasn't Git switched to SHA-2? People have been warning that SHA-1 is vulnerable for over a decade, but that vulnerability was dismissed with a lot of hand-waving over at Git. Is it a very difficult technical problem to switch, or just a problem of backward compatibility for existing repos (i.e., it would be expensive to change everything over)?
I really tried with SVN (wanted something better than CVS) for quite a long time.
I much prefer that git's designed to let me do such things and provides tools for doing so, but you can totally rewire svn repos with vi and a bunch of swearing if necessary.
(and I was using svk for a merge tool at the time so I did have the option to burn it down and rebuild from scratch; unhosing svn repos wasn't quite unpleasant enough for me to want to do so)
Then again, I started off doing more ops than dev and have also happily hand-edited mysql replication logs to unfuck things after a partial failover, so I may have more of a masochistic streak than you do :)
The bugs you can expect from software that assumed no hash collions are going to be pretty arbitrary. There was that stack overflow post about what happens with Git with collisions and it didn't seem great either, it's just that what gets hashed happens not to collide in this case.
https://lists.webkit.org/pipermail/webkit-dev/2017-February/...
> For the record: the commits have been deleted, but the SVN is still hosed.
> Subversion 1.8 avoids downloading pristine content that is already present in the cache, based on the content's SHA1 or MD5 checksum.
https://subversion.apache.org/docs/release-notes/1.8.html#pr...
I'm guessing shattered-1.pdf and shattered-2.pdf have identical hashes but distinct contents. It's not clear for me to know why this results in a "checksum mismatch."
Checksum mismatch: LayoutTests/http/tests/cache/disk-cache/resources/shattered-2.pdf
expected: 5bd9d8cabc46041579a311230539b8d1
got: ee4aa52b139d925f8d8884402b0a750c
EDIT: see https://news.ycombinator.com/item?id=13725312 for the answer $ sha1sum shattered*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf
$ md5sum shattered*
ee4aa52b139d925f8d8884402b0a750c shattered-1.pdf
5bd9d8cabc46041579a311230539b8d1 shattered-2.pdf
As you can see.