undefined | Better HN

0 pointsgroovybits6y ago0 comments

Further details as to why Torvalds is not concerned:

From the email...

"I haven't seen the attack yet, but git doesn't actually just hash the data, it does prepend a type/length field to it. That usually tends to make collision attacks much harder, because you either have to make the resulting size the same too, or you have to be able to also edit the size field in the header."

[...]

"I haven't seen the attack details, but I bet

(a) the fact that we have a separate size encoding makes it much harder to do on git objects in the first place

(b) we can probably easily add some extra sanity checks to the opaque data we do have, to make it much harder to do the hiding of random data that these attacks pretty much always depend on."

0 comments

20 comments · 6 top-level

est316y ago· 7 in thread

That first quote is misleading. git's special hashing scheme doesn't make the attack "much harder". First there is no difference in length in the original shattered collision already:

$ curl https://shattered.io/static/shattered-1.pdf | wc -c

422435

$ curl -s https://shattered.io/static/shattered-2.pdf | wc -c

422435

Second, the length is already being hashed into the content during computation of a SHA-1 hash. Look up Merkle-Damgard construction: https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_co...

There is benefit in storing the length at the prefix as well, as you can avoid length extension attacks, but that's not making attacks "much harder".

runeks6y ago

The point is that the hashed data must follow a specific data format, and can’t just be arbitrary data. This means that the collision data MUST contain the length at some specific offset in the data, which makes it harder to find a collision.

The more restrictive the serialization format of the hashed data, the harder it is to find a collision that’s valid in the given application context.

est316y ago

Yeah the data must start with a specific prefix, but can otherwise contain whatever it wants. Anyways, even the shattered attack, which this paper says costs 11k to execute today, had a pdf specific prefix (the shown part is the same in both files):

    $ curl -s https://shattered.io/static/shattered-1.pdf | hexdump -n 512 -C 
    00000000  25 50 44 46 2d 31 2e 33  0a 25 e2 e3 cf d3 0a 0a  |%PDF-1.3.%......|
    00000010  0a 31 20 30 20 6f 62 6a  0a 3c 3c 2f 57 69 64 74  |.1 0 obj.<</Widt|
    00000020  68 20 32 20 30 20 52 2f  48 65 69 67 68 74 20 33  |h 2 0 R/Height 3|
    00000030  20 30 20 52 2f 54 79 70  65 20 34 20 30 20 52 2f  | 0 R/Type 4 0 R/|
    00000040  53 75 62 74 79 70 65 20  35 20 30 20 52 2f 46 69  |Subtype 5 0 R/Fi|
    00000050  6c 74 65 72 20 36 20 30  20 52 2f 43 6f 6c 6f 72  |lter 6 0 R/Color|
    00000060  53 70 61 63 65 20 37 20  30 20 52 2f 4c 65 6e 67  |Space 7 0 R/Leng|
    00000070  74 68 20 38 20 30 20 52  2f 42 69 74 73 50 65 72  |th 8 0 R/BitsPer|
    00000080  43 6f 6d 70 6f 6e 65 6e  74 20 38 3e 3e 0a 73 74  |Component 8>>.st|
    00000090  72 65 61 6d 0a ff d8 ff  fe 00 24 53 48 41 2d 31  |ream......$SHA-1|
    000000a0  20 69 73 20 64 65 61 64  21 21 21 21 21 85 2f ec  | is dead!!!!!./.|

The shattered attack was about a so-called "identical prefix" collision, while the shambles paper's collision was a "chosen prefix" one. You can choose it in both cases, but in the "chosen prefix" one both colliding prefixes can be entirely different (and can be as long as you want btw, the attack doesn't cost more if the prefix is 4 KB vs 4 GB), while in the "identical prefix" case it has to be identical.

toyg6y ago

> the harder it is to find a collision that’s valid in the given application context

In the double-digit thousands of dollars, an attack that gets 10x or 100x harder is still cheap for state actors.

Assuming the NSA is at least a year or two ahead of the field, git should now accelerate its migration process.

toyg6y ago

Yeah, that quote doesn't exactly make me confident about Linus's understanding of this particular issue.

DarkWiiPlayer6y ago

It's not about it being the same length, but the length of the data being part of the hashed data, which, Linus assumes, will likely make it more difficult to find a collision. He even says at the beginning that he hasn't had a look at the attack yet and is just making an assumption.

1 more reply

tny6y ago

But even if the lengths are same, the resulting SHA1 will be different since you prefix the length before hashing

est316y ago

The shattered prefix was chosen as well, see my other comment in the thread: https://news.ycombinator.com/item?id=21980759

The only thing that prefixing the length makes difficult is using the same prefix multiple times: you basically have to make up your mind about the type and length before mounting the shattered attack. Also, the prefix means you have to do your own shattered attack and can't use the PDFs that google provided as proof of their project's success. Price tag for that seems to be 11k.

[1]: https://github.com/cr-marcstevens/sha1collisiondetection

wyldfire6y ago· 3 in thread

I think those are pretty practical approaches.

But it sounds as if the cost of changing the hash algorithm is high. What are the impacts of this change? How many things would break if git just changed the algorithm with each new release? Does git assume that the hash algorithm is statically given to be SHA-1 or are there qualifiers on which algorithm is enabled/permitted/configured?

paulddraper6y ago

After making the actual code change, the biggest problem is breaking compatibility with decades of tools in the ecosystem that rely on historically consistent SHA-1 hashes.

Git is moving to a flexible hash though. [1]

[1] https://stackoverflow.com/questions/28159071/why-doesnt-git-...

toyg6y ago

Maybe it's time for a version 3 that breaks a bit of compatibility.

The Python community would freak out, lol.

simias6y ago

The cost is very high but it's only getting higher with time. People have known that SHA-1 was weak and deprecated for much of git's existence. Doing the switch in 2010 would've been painful, doing it now would be orders of magnitude more so and I doubt it'll get any easier in 2030 unless some other SCM manages to overtake git in popularity which seems unlikely at this point.

Unless Linus really believes that git will be fine using SHA-1 for decades to come I don't think it's very responsible to keep kicking the ball down the road waiting for the inevitable day when a viable proof of concept attack on git will be published and people will have to emergency-patch everything.

bjornsing6y ago· 2 in thread

> you have to be able to also edit the size field in the header.”

As I read the OP [1] a chosen-prefix collision attack such as this allows you to “edit the size field in the header”. Or am I missing something?

1. “A chosen-prefix collision is a more constrained (and much more difficult to obtain) type of collision, where two message prefixes P and P’ are first given as challenge to the adversary, and his goal is then to compute two messages M and M’ such that H(P || M) = H(P’ || M’), where || denotes concatenation.”

EDIT: On second thought I was missing something: the adversary is further constrained in the git case because it must find M and M’ of correct length (specified in P and P’). Linus is right (as usual), this probably makes it much harder.

bjornsing6y ago

A few emails forward in the thread Linus explains though why we don’t need to worry much about this attack in practice: https://marc.info/?l=git&m=148787287624049&w=2

This argument sounds sound to me.

Thorrez6y ago

His argument assumes the file is text and people read the entire thing. If either of those assumptions are false, then it's not safe.

People store things in git that aren't text. Therefore it's not safe.

tedunangst6y ago· 2 in thread

What if somebody makes an attack where they can choose the size and then find a collision?

banana_giraffe6y ago

Like the two files on the linked page?

Skunkleton6y ago

The two files on the linked page were both full of junk data. I suspect that those files being of the same length isn't the norm.

2 more replies

simias6y ago

The fact that this attack is chosen prefix does weaken the first argument though, you may now find a collision even accounting for any prefixed git "header". The rest is still completely valid though.

I still feel like they really should've taken this problem more seriously and earlier. The more we wait the more painful the migration will be when the day comes to move to a different hash function, because everybody knows that'll happen sooner or later. Two years ago we had a collision, now we have chosen prefix, how much longer until somebody actually manages to make a git object collision?

And keep in mind that public research is probably several years behind top secret state agency capabilities. Let's stop looking for excuses every time SHA-1 takes a hit and rip the bandaid already. It's going to be messy and painful but it has to be done.

1 more reply

joeyh6y ago

"(b)" is kind of amusing.. It's been known since 2011 that collision generating garbage material can be put after a NUL in a git commit message and hidden from git log, git show, etc. Still not fixed.

With this chosen-prefix attack, they chose two prefixes and generated collisions by appending some data. So your two prefixes just need to be "tree {GOOD,BAD}\nauthor foo\n\nmerge me\0"

The only thing preventing injecting a backdoor into a pull request now seems to be git's use of hardened sha1.

j / k navigate · click thread line to collapse

0 comments

20 comments · 6 top-level

est316y ago· 7 in thread

That first quote is misleading. git's special hashing scheme doesn't make the attack "much harder". First there is no difference in length in the original shattered collision already:

$ curl https://shattered.io/static/shattered-1.pdf | wc -c

422435

$ curl -s https://shattered.io/static/shattered-2.pdf | wc -c

422435

Second, the length is already being hashed into the content during computation of a SHA-1 hash. Look up Merkle-Damgard construction: https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_co...

There is benefit in storing the length at the prefix as well, as you can avoid length extension attacks, but that's not making attacks "much harder".

runeks6y ago

The more restrictive the serialization format of the hashed data, the harder it is to find a collision that’s valid in the given application context.

est316y ago

    $ curl -s https://shattered.io/static/shattered-1.pdf | hexdump -n 512 -C 
    00000000  25 50 44 46 2d 31 2e 33  0a 25 e2 e3 cf d3 0a 0a  |%PDF-1.3.%......|
    00000010  0a 31 20 30 20 6f 62 6a  0a 3c 3c 2f 57 69 64 74  |.1 0 obj.<</Widt|
    00000020  68 20 32 20 30 20 52 2f  48 65 69 67 68 74 20 33  |h 2 0 R/Height 3|
    00000030  20 30 20 52 2f 54 79 70  65 20 34 20 30 20 52 2f  | 0 R/Type 4 0 R/|
    00000040  53 75 62 74 79 70 65 20  35 20 30 20 52 2f 46 69  |Subtype 5 0 R/Fi|
    00000050  6c 74 65 72 20 36 20 30  20 52 2f 43 6f 6c 6f 72  |lter 6 0 R/Color|
    00000060  53 70 61 63 65 20 37 20  30 20 52 2f 4c 65 6e 67  |Space 7 0 R/Leng|
    00000070  74 68 20 38 20 30 20 52  2f 42 69 74 73 50 65 72  |th 8 0 R/BitsPer|
    00000080  43 6f 6d 70 6f 6e 65 6e  74 20 38 3e 3e 0a 73 74  |Component 8>>.st|
    00000090  72 65 61 6d 0a ff d8 ff  fe 00 24 53 48 41 2d 31  |ream......$SHA-1|
    000000a0  20 69 73 20 64 65 61 64  21 21 21 21 21 85 2f ec  | is dead!!!!!./.|

toyg6y ago

> the harder it is to find a collision that’s valid in the given application context

In the double-digit thousands of dollars, an attack that gets 10x or 100x harder is still cheap for state actors.

Assuming the NSA is at least a year or two ahead of the field, git should now accelerate its migration process.

toyg6y ago

Yeah, that quote doesn't exactly make me confident about Linus's understanding of this particular issue.

DarkWiiPlayer6y ago

1 more reply

tny6y ago

But even if the lengths are same, the resulting SHA1 will be different since you prefix the length before hashing

est316y ago

The shattered prefix was chosen as well, see my other comment in the thread: https://news.ycombinator.com/item?id=21980759

[1]: https://github.com/cr-marcstevens/sha1collisiondetection

wyldfire6y ago· 3 in thread

I think those are pretty practical approaches.

paulddraper6y ago

After making the actual code change, the biggest problem is breaking compatibility with decades of tools in the ecosystem that rely on historically consistent SHA-1 hashes.

Git is moving to a flexible hash though. [1]

[1] https://stackoverflow.com/questions/28159071/why-doesnt-git-...

toyg6y ago

Maybe it's time for a version 3 that breaks a bit of compatibility.

The Python community would freak out, lol.

simias6y ago

bjornsing6y ago· 2 in thread

> you have to be able to also edit the size field in the header.”

As I read the OP [1] a chosen-prefix collision attack such as this allows you to “edit the size field in the header”. Or am I missing something?

bjornsing6y ago

A few emails forward in the thread Linus explains though why we don’t need to worry much about this attack in practice: https://marc.info/?l=git&m=148787287624049&w=2

This argument sounds sound to me.

Thorrez6y ago

His argument assumes the file is text and people read the entire thing. If either of those assumptions are false, then it's not safe.

People store things in git that aren't text. Therefore it's not safe.

tedunangst6y ago· 2 in thread

What if somebody makes an attack where they can choose the size and then find a collision?

banana_giraffe6y ago

Like the two files on the linked page?

Skunkleton6y ago

The two files on the linked page were both full of junk data. I suspect that those files being of the same length isn't the norm.

2 more replies

simias6y ago

1 more reply

joeyh6y ago

With this chosen-prefix attack, they chose two prefixes and generated collisions by appending some data. So your two prefixes just need to be "tree {GOOD,BAD}\nauthor foo\n\nmerge me\0"

The only thing preventing injecting a backdoor into a pull request now seems to be git's use of hardened sha1.

j / k navigate · click thread line to collapse