A webshell and a normal file that have the same MD5 (opens in new tab)

(github.com)

98 pointsshlomo_z7mo ago47 comments

47 comments

Proof of Concept or GTFO issue 0x14 is a PDF document file that can also be run as a NES ROM. The file will display its own MD5 hash in a PDF viewer, and also displays its own MD5 hash in a NES emulator (only first 40KB+16 bytes are actually loaded there)

https://github.com/angea/pocorgtfo#0x14

And yes, documents are not normally supposed to be able to display their own MD5 hash.

Retr0id7mo ago

I made https://github.com/DavidBuchanan314/monomorph, which packs up to 4KB of shellcode into an executable that always has the same hash. So you're not just limited to a good/evil pair, you can arbitrarily change the behaviour in future without changing the hash.

Also, a more recent innovation in MD5 collisions is textcoll, which creates colliding blocks that are completely plaintext. This would allow for colliding PHP source files like in OP but without any obvious binary artefacts (although this requires identical prefixes).

https://github.com/cr-marcstevens/hashclash?tab=readme-ov-fi...

magicalhippo7mo ago

Not only is MD5 broken as shown here, if you have a modern CPU it's also quite slow compared to good, non-broken alternatives. See for example this comparison[1] (post says JavaScript but it's actually OpenSSL's implementation that's actually tested).

[1]: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-c...

gruez7mo ago

I only see new CPUs benchmarked, maybe that's because newer CPUs have SHA acceleration extensions? I'd expect SHA256 to be more complex and therefore be more computationally expensive.

sltkr7mo ago

Yes, SHA256 is faster than MD5 only if you have hardware accelleration. But SHA256 itself is pretty slow compared to the state of the art. For example, BLAKE3 is just as secure as SHA256 but an order of magnitude faster.

Try this on your own system:

    $ head -c 1000000000 /dev/urandom > random-1gb
    
    $ time md5sum random-1gb 
    ef72a3616aad5117ddf40a7d5f5d0162  random-1gb
    
    real 0m2.428s
    user 0m2.192s
    sys 0m0.202s
    
    $ time sha256sum random-1gb 
    ec7d7f31c4489acae8328fddbe54157f1cb9e97b220ef502a07e1f9230969310  random-1gb
    
    real 0m3.894s
    user 0m3.697s
    sys 0m0.181s
    
    $ time b3sum random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.282s !!!
    user 0m0.876s
    sys 0m0.124s
    
    $ time b3sum --num-threads=1 random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.597s
    user 0m0.488s
    sys 0m0.107s

This is on an old Chromebook with Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz CPU (dual core, but with hyperthreading). Note that even using only a single thread (which SHA256 and MD5 are limited to by their design), BLAKE3 is 6x as fast as SHA256 and 4x as fast as MD5.

edgineer7mo ago

>BLAKE3 is just as secure as SHA256 but an order of magnitude faster

Is this not an oxymoron? E.g. b3 then ought to be an order of magnitude easier to brute force.

2 more replies

adrian_b7mo ago

Unlike SHA-256, BLAKE3 can be evaluated in parallel, so the speedup factor over SHA-256 depends on the number of available CPU cores.

While BLAKE3 can be many times faster than SHA-256, by consuming many times more power, the amount of work for computing a hash differs much less between the 2 hashes than the execution time on a multi-core CPU.

The speed difference quoted by you for a single thread is caused by your Skylake-based CPU, which does not have the SHA hardware instructions.

Moreover, even the programs that claim to use the SHA hardware instructions may have a speed several times lower than allowed by the hardware, because the more recent CPUs, e.g. from the last 4 years, have wider SHA instructions than the older CPUs, but the programs must have been compiled to support such CPUs, e.g. Zen 3 and newer or Alder Lake and newer.

1 more reply

adrian_b7mo ago

Hardware SHA-1 and SHA-256 are now supported by many CPUs, many of which are already older than a decade, i.e. almost all 64-bit ARM-based CPUs, all AMD Zen, many generations of Intel Atom and the Intel Core CPUs starting with Ice Lake.

The only CPUs still likely to be in use and without SHA support are the Intel Core CPUs until and including the Skylake derivatives (i.e. up to Comet Lake, i.e. up to 6 years ago).

The Intel Atoms have received SHA support many years before Intel Core, because they competed with ARM, which already had such support.

The support in Intel Core has been added due to AMD Zen, but the products with it have been delayed by the failure of Intel to achieve acceptable fabrication yields in their 10-nm CMOS process, before 2019/2020.

andreareina7mo ago

The normal file doesn't look that normal

o11c7mo ago

Keep in mind that the stated use is cache-poisoning of automated scanners, not fooling humans.

slow_typist7mo ago

Humans have to put the so called php-file on the server intentionally for any subsequent attack to work. But it is a binary file.

h33t-l4x0r7mo ago

I imagine it's supposed to get onto the server by an exploited vulnerable image upload plugin

1 more reply

Incipient7mo ago

The idea here is you can trigger a server to run the "safe" php file, then send it the webshell version, which passes hash based scanning?

chipsrafferty7mo ago

Yes, but you'd need a situation where:

1. You can upload scripts that get scanned for malicious code 2. These scripts can be executed once deemed "safe" 3. The server is using MD5 hashes to determine if you uploaded the same file or if it should re-scan it

3. Is where the issue is. It should probably always re-scan it and it definitely should not be using MD5.

falcor847mo ago

>The server is using MD5 hashes to determine if you uploaded the same file or if it should re-scan it

Wouldn't the sensible thing for a server that gets an upload matching an existing file's hash be to just treat it as an idempotent no-op? What reason would it have to replace the old version with a presumably identical copy? What am I missing?

dnet7mo ago

I assume the scanner is a separate library/service that receives the contents and returns a boolean safe/malicious result, and the implementation using MD5 to avoid expensive re-scans is an internal detail hidden from the caller.

szszrk7mo ago

Is there any fairly popular software that still uses md5 in this context?

Most I've seen (sec scans, backup validation/dedup etc) pushed to phase out md5 very long time ago.

h33t-l4x0r7mo ago

Wordpress uses md5 checksums for core files. That doesn't make it vulnerable to this, except in the sense that it kind of validates using them.

IshKebab7mo ago

There's no need to rescan. You just need to use a secure hash.

jgalt2127mo ago

Secure for now, rather. A solid game plan would be to have your code base set up to easily swap in a new hashing method when called for. I believe Django automatically promotes passwords stored with insecure hashes to secure ones the next time a user logs in.

1 more reply

sim7c007mo ago

the safe file is not a valid php file? it might be executed if php is like javascript ignorning valid chars, but i doubt something actually 'looking at it' would accept it as benign or valid.

dsab7mo ago

It's a pity that there is no description of what it is supposed to be used for.

lisper7mo ago

If you don't know, then you aren't the target audience.

But there are two applications: the first is breaking in to a system under some very obscure set of circumstances that you are very unlikely to encounter in the real world. The second is to bump up your karma on HN.

bawolff7mo ago

> If you don't know, then you aren't the target audience.

If you do know, then you also know md5 being broken is really really old news.

Seriously. Cryptographers have been warning that md5 seems weak since 1996. There are probably people reading this thread who weren't even alive yet. (It got totally broken in 2004 but the warning signs were way earlier).

ramses07mo ago

Someone with more karma motivation could post this as a top level story, but Plex offers to validate their Debian public key via MD5: https://support.plex.tv/articles/235974187-enable-repository...

Such security! Much wow!

1 more reply

alkonaut7mo ago

> system under some very obscure set of circumstances that you are very unlikely to encounter in the real world.

Is there any way to use HN karma? Like, can I sell my account on some shady exchange like people sell big twitter accounts? And if I can, what's the going rate for internet points these days? Asking for an unscrupulous friend.

lisper7mo ago

> Is there any way to use HN karma?

Nothing other than vanity AFAIK.

It's actually a bit of a scam because karma accumulates and never expires. I've been on the leaderboard for a long time, not because I'm making particularly valuable contributions (I only post a few times a week) but just because I've been on HN since it launched.

integralid7mo ago

After, sometimes, the initial scanning, the security and AV industry deals with file hashes, not actual files. This means that if you wrote a legitimate, harmful program, and a malicious version with the same hash, you would be able to troll the security rolls in many cases. Basically, those two files would look the same to the security program.

The thing that makes this blog post not realistic is:

* Such tricks would make much more sense with normal programs, where you're trying to trick an user to download and execute it. Webshells are downloaded by the attacker knowingly.

* Md5 is not used anymore (although I know security vendors who used it for embarrassingly long time). If this was SHA256, that attack would be devastating for many more severe reasons.

But it's still a fun PoC.

chipsrafferty7mo ago

Because there's unlikely to be a use case

h4ck_th3_pl4n3t7mo ago

The answer is likely wordpress, because its default wp_hash algorithm is still MD5.

0points7mo ago

> The answer is likely wordpress, because its default wp_hash algorithm is still MD5.

That's only true if you ignore all the details.

As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.

WP uses salt and multiple rounds of hashing, fully mitigating the md5 collisions being topic of discussion here.

So no, wp doesn't "use md5" in the sense that they would be vulnerable to this type of attack.

Source: https://developer.wordpress.org/reference/functions/wp_hash_...

h4ck_th3_pl4n3t7mo ago

Your source described wp_hash_password(), not wp_hash().

As the OP article/PoC is about hashing uploaded files, not passwords btw, I think you should read it again.

Because as I pointed out, wp_hash() is used to check against uploaded files.

Oh, and source: https://developer.wordpress.org/reference/functions/wp_hash/

And as I cannot resist quoting you for trying to smartass while literally not having read the source code the PoC was about:

> As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.

downtown_7mo ago

This is not related to password hashing.,.

high_na_euv7mo ago

Literally in this "article"

>Can use it bypass some cached webshell detections.

eptcyka7mo ago

> As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone.

The amount of sweet, sweet irony displayed here will make me diabetic. Did you read the article at all? Salting? What are you on about?

Honestly, it feels that some HN commenters are LLMs instructed to defend a given entity.

1 more reply

IshKebab7mo ago

It says at the end of the README:

> Can use it bypass some cached webshell detections.

sim7c007mo ago

honestly, normal.php is not a valid php file. i do understand that it might bypass some checks if say normal.php was somehow flagged as a valid / benign file but in all honesty that would be really bad sec product u wanna swap with something that more intelligently classifies files... additionally, most products these days also use sha1, sha2 and sometimes things like ssdeep to have multiple hash variants to check. this ensures that any collisions will be mitigated as it's not known yet to make 1 file match on all of these different types of hashes, despite collisions being possible in a number of them for sure.

if normal.php had actual php code in there, being really 'normal' as the name implies, this would be much more severe / interesting because it might be more easy to convince modern security products it's actually a benign file.

Currently if it would be analysed, it would be flagged as suspicious simply because its not a valid file. and really, it dont need to be php, it could be any valid file format as long as it's an actually file that has benign behavior or contents.

plaintext might be easier to generate, but you'd need it to be 'executable' format or something interpretable like a script to have it actually stored in databases marking files as malicious or benign. matching filetype with the malicious file, in a valid form that does actual benign behavior would be 'best'.

don't take me wrong tho. still fun to see these things and honestly props, if it bypasses anything that's always a 'nice result' :)

Blahagun7mo ago

normal.php is a perfectly valid php file. Sure, it doesn't contain php code but that doesn't make it invalid php file. If it did have <?php somewhere and if the following wasn't a syntactically valid PHP code, then you could say it's not a valid php file.

sim7c007mo ago

yeah ok fair point. from the interpreter perspective. but that is not the tool which checks security. in that context validity is determined by another tool, which will look beyond merely being interpretable by the php interpreter.

its funny often web basted languages have this property tho , i mean, how else you gonna poison logs and execute them :')... js and php are just adorable for providing opportunities :D

j / k navigate · click thread line to collapse

47 comments

Dwedit7mo ago

https://github.com/angea/pocorgtfo#0x14

And yes, documents are not normally supposed to be able to display their own MD5 hash.

Retr0id7mo ago

https://github.com/cr-marcstevens/hashclash?tab=readme-ov-fi...

magicalhippo7mo ago

[1]: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-c...

gruez7mo ago

I only see new CPUs benchmarked, maybe that's because newer CPUs have SHA acceleration extensions? I'd expect SHA256 to be more complex and therefore be more computationally expensive.

sltkr7mo ago

Try this on your own system:

    $ head -c 1000000000 /dev/urandom > random-1gb
    
    $ time md5sum random-1gb 
    ef72a3616aad5117ddf40a7d5f5d0162  random-1gb
    
    real 0m2.428s
    user 0m2.192s
    sys 0m0.202s
    
    $ time sha256sum random-1gb 
    ec7d7f31c4489acae8328fddbe54157f1cb9e97b220ef502a07e1f9230969310  random-1gb
    
    real 0m3.894s
    user 0m3.697s
    sys 0m0.181s
    
    $ time b3sum random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.282s !!!
    user 0m0.876s
    sys 0m0.124s
    
    $ time b3sum --num-threads=1 random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.597s
    user 0m0.488s
    sys 0m0.107s

edgineer7mo ago

>BLAKE3 is just as secure as SHA256 but an order of magnitude faster

Is this not an oxymoron? E.g. b3 then ought to be an order of magnitude easier to brute force.

2 more replies

adrian_b7mo ago

Unlike SHA-256, BLAKE3 can be evaluated in parallel, so the speedup factor over SHA-256 depends on the number of available CPU cores.

The speed difference quoted by you for a single thread is caused by your Skylake-based CPU, which does not have the SHA hardware instructions.

1 more reply

adrian_b7mo ago

The only CPUs still likely to be in use and without SHA support are the Intel Core CPUs until and including the Skylake derivatives (i.e. up to Comet Lake, i.e. up to 6 years ago).

The Intel Atoms have received SHA support many years before Intel Core, because they competed with ARM, which already had such support.

andreareina7mo ago

The normal file doesn't look that normal

o11c7mo ago

Keep in mind that the stated use is cache-poisoning of automated scanners, not fooling humans.

slow_typist7mo ago

Humans have to put the so called php-file on the server intentionally for any subsequent attack to work. But it is a binary file.

h33t-l4x0r7mo ago

I imagine it's supposed to get onto the server by an exploited vulnerable image upload plugin

1 more reply

Incipient7mo ago

The idea here is you can trigger a server to run the "safe" php file, then send it the webshell version, which passes hash based scanning?

chipsrafferty7mo ago

Yes, but you'd need a situation where:

3. Is where the issue is. It should probably always re-scan it and it definitely should not be using MD5.

falcor847mo ago

>The server is using MD5 hashes to determine if you uploaded the same file or if it should re-scan it

dnet7mo ago

szszrk7mo ago

Is there any fairly popular software that still uses md5 in this context?

Most I've seen (sec scans, backup validation/dedup etc) pushed to phase out md5 very long time ago.

h33t-l4x0r7mo ago

Wordpress uses md5 checksums for core files. That doesn't make it vulnerable to this, except in the sense that it kind of validates using them.

IshKebab7mo ago

There's no need to rescan. You just need to use a secure hash.

jgalt2127mo ago

1 more reply

sim7c007mo ago

the safe file is not a valid php file? it might be executed if php is like javascript ignorning valid chars, but i doubt something actually 'looking at it' would accept it as benign or valid.

dsab7mo ago

It's a pity that there is no description of what it is supposed to be used for.

lisper7mo ago

If you don't know, then you aren't the target audience.

bawolff7mo ago

> If you don't know, then you aren't the target audience.

If you do know, then you also know md5 being broken is really really old news.

ramses07mo ago

Someone with more karma motivation could post this as a top level story, but Plex offers to validate their Debian public key via MD5: https://support.plex.tv/articles/235974187-enable-repository...

Such security! Much wow!

1 more reply

alkonaut7mo ago

> system under some very obscure set of circumstances that you are very unlikely to encounter in the real world.

lisper7mo ago

> Is there any way to use HN karma?

Nothing other than vanity AFAIK.

integralid7mo ago

The thing that makes this blog post not realistic is:

* Such tricks would make much more sense with normal programs, where you're trying to trick an user to download and execute it. Webshells are downloaded by the attacker knowingly.

* Md5 is not used anymore (although I know security vendors who used it for embarrassingly long time). If this was SHA256, that attack would be devastating for many more severe reasons.

But it's still a fun PoC.

chipsrafferty7mo ago

Because there's unlikely to be a use case

h4ck_th3_pl4n3t7mo ago

The answer is likely wordpress, because its default wp_hash algorithm is still MD5.

0points7mo ago

> The answer is likely wordpress, because its default wp_hash algorithm is still MD5.

That's only true if you ignore all the details.

As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.

WP uses salt and multiple rounds of hashing, fully mitigating the md5 collisions being topic of discussion here.

So no, wp doesn't "use md5" in the sense that they would be vulnerable to this type of attack.

Source: https://developer.wordpress.org/reference/functions/wp_hash_...

h4ck_th3_pl4n3t7mo ago

Your source described wp_hash_password(), not wp_hash().

As the OP article/PoC is about hashing uploaded files, not passwords btw, I think you should read it again.

Because as I pointed out, wp_hash() is used to check against uploaded files.

Oh, and source: https://developer.wordpress.org/reference/functions/wp_hash/

And as I cannot resist quoting you for trying to smartass while literally not having read the source code the PoC was about:

> As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.

downtown_7mo ago

This is not related to password hashing.,.

high_na_euv7mo ago

Literally in this "article"

>Can use it bypass some cached webshell detections.

eptcyka7mo ago

> As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone.

The amount of sweet, sweet irony displayed here will make me diabetic. Did you read the article at all? Salting? What are you on about?

Honestly, it feels that some HN commenters are LLMs instructed to defend a given entity.

1 more reply

IshKebab7mo ago

It says at the end of the README:

> Can use it bypass some cached webshell detections.

sim7c007mo ago

don't take me wrong tho. still fun to see these things and honestly props, if it bypasses anything that's always a 'nice result' :)

Blahagun7mo ago

sim7c007mo ago

its funny often web basted languages have this property tho , i mean, how else you gonna poison logs and execute them :')... js and php are just adorable for providing opportunities :D

j / k navigate · click thread line to collapse