undefined | Better HN

0 pointsrawling2y ago0 comments

I'm confused as to how they've done this. The original message, sure, you can brute force the digits and hope you get a collision, and try a new, plausible preamble if not. But I don't see how they've found this collision without it looking like anything is brute forced.

Is it using substitute Unicode characters or something?

E: no, just hand typed it and got the same...

0 comments

12 comments · 7 top-level

devit2y ago· 3 in thread

Presumably generated lots of variations of that sentence. It's 28 bits, so you only need to have around 14 places where 4 variations are possible.

For instance, it could start with "Was", "I was", "verifying" could be "checking" or "computing" or "testing", etc.

A bit tight, but it seems feasible with some work.

aib2y ago

Indeed, this is how I did it. 2^28 is around 270 million. With little more than a handful options, it should be possible. Although I have to say, it turned out to be more difficult than I'd initially thought. Maybe it's better to think of it as 28 different boolean choices.

  echo -n "Indeed. This is how I managed to do it. 2^28 is around 300 million. With only a handful options, it's possible. Although, I must say that it turned out to be more difficult than I'd initially thought. Perhaps it's better to think of it as 28 different alternatives." | sha256sum

Alextigtig2y ago

This is crazy! The hash of this comment once again begins 182a7c9!!! Well done indeed!

fbdab1032y ago

Throwing a loop of numbers on the end seems far more feasible. I put together a hasty Python implementation which can immediately find three hits at 5 characters. TQDM is reporting ~450k tries per second, so depending on how lucky you are, would probably want to redo in a faster language to solve for 7+

    import hashlib
    import itertools
    
    import tqdm
    
    BLOCKS_SIZE = 5
    sentence_prefix = "The SHA256 for this sentence begins with:"
    
    itos = {0x00:"zero", 0x01:"one", 0x02:"two", 0x03:"three", 0x04:"four", 0x05:"five", 0x06:"six", 0x07:"seven", 0x08:"eight", 0x09:"nine",
           0x0a:"a", 0x0b:"b", 0x0c:"c", 0x0d:"d", 0x0e:"e", 0x0f:"f"}
    for nums in tqdm.tqdm(itertools.product(itos.keys(), repeat=BLOCKS_SIZE)):
        sentence = f"{sentence_prefix} {', '.join(itos[num] for num in nums[:-1])}, and {itos[nums[-1]]}."
        hash_true = hashlib.sha256(bytes(sentence, "utf8")).hexdigest()
        guessed_prefix = "".join(f"{n:x}" for n in nums)
        true_prefix = hash_true[:BLOCKS_SIZE]
        if guessed_prefix == true_prefix:
            print("collision")
            print(sentence)
            print(hash_true)

hinkley2y ago· 2 in thread

All we are demonstrating here is why Sha256 is 256 bits and not 32 bits. We have trivially identified collisions for the first 28 bits of the output, which is only 11% of the entire hash size.

Difficulty of collisions roughly doubles for each additional bit. Imagine we had a SHA32, that would be 16 times harder to achieve a collision. SHA256 is 43 with 67 zeroes behind it more difficult than the examples here.

rawlingOP2y ago

Yeah, I forgot how many bits were in a hex digit and made it seem much harder than it really was to myself.

hughesjj2y ago

I also mess up base16 and either base 256 or base64 when doing Feynman estimates

noctune2y ago

7 digits of the hash makes for 16*7 possible hashes. I spot 4 potential "filler" lines in that tweet, so if you find log4(16*7)=14 candidates for each of those filler lines, then one combination would be expected to yield that hash.

petercooper2y ago

I just did an extremely lazy version of that and posted a reply of "And the SHA-1 digest (in hex) of this tweet starts BEEF" - https://twitter.com/cooperx86/status/1701261047917633846

Basically I had several substitutions around words, case, punctuation, etc. and just ran it until it found some hits. Quite easy with just four characters though but was only a proof of concept.

nstbayless2y ago

It's possible that the tweets were actually produced together somehow. This might buy just enough search space between the two of them.

kazinator2y ago

You can generate sentences of the form "This sentence begins with: " followed by seven comma-separated english words denoting hex digits. Then search that space of digits, until you get a hit.

For each digit combination, you can try it with multiple variations of the sentence like "The SHA256 of this sentence begins with", "The SHA256 hash of this text starts with" and many more. That increases the search space without increasing the number of digits that have to match, making it more likely that a hit is found.

1 more reply

JamesSwift2y ago

I think this is a really good usecase for chatGPT to generate a massive number of variations that you then feed into a validation function

j / k navigate · click thread line to collapse