PNG Polyglot Files (opens in new tab)

(gildas-lormeau.github.io)

106 pointsgildas1y ago37 comments

37 comments

24 comments · 9 top-level

OkGoDoIt1y ago· 4 in thread

I was hoping for an example PNG on the webpage to showcase that it actually works. I’m on my phone so I can’t do much with a downloaded zip file. But it would be cool to see that the PNG renders like a normal image on Safari mobile.

gildasOP1y ago

Note that if you're on iOS, it's possible that the HTML page doesn't work at all because when it's opened from the filesystem, it's displayed by a viewer which doesn't support JS instead of Safari.

Dwedit1y ago

It's the "Rennes JS User Group" image that you see in the middle of the HTML page.

a1o1y ago

I am also on my phone and found it weird that wasn't a single online demo

gildasOP1y ago

Here is the demo file (cf. the first paragraph and the end of the article): https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

2 more replies

gildasOP1y ago· 3 in thread

Note that you can also take advantage of the fact that a ZIP can be password-protected and make your web page secret! For example https://gildas-lormeau.github.io/private/ (password: "thisisapage").

jclarkcom1y ago

If you are loading external libraries like in this example your encrypted data is at risk. It would be better to include the decryption code directly in the Js or embed Js zlib.

gildasOP1y ago

It's possible to define the Content Security Policy with a <META> tag in the "bootstrap page" and prevent this kind of security issue, e.g. <META http-equiv="content-security-policy" content="connect-src 'self' data: blob:;">

2 more replies

nhinck31y ago

You can also use the SubtleCrypto API

Retr0id1y ago· 2 in thread

> a bug in “Archive Utility” on macOS prevents it from decompressing the resulting file

I looked into this in the past, it's because they check for a "PK" header at the start of the file - which is of course not actually required. I assumed it was deliberate because it does exclude most "weird" ZIPs.

By the way, if you're interested in this sort of file format wrangling, check out Ange Albertini's talk tomorrow at 38c3: https://fahrplan.events.ccc.de/congress/2024/fahrplan/talk/Q...

Lammy1y ago

> it's because they check for a "PK" header at the start of the file

Lots of FOSS tooling will have a similar limitation due to the lack of support in the shared-mime-info spec for reading identifying features from the ends of files. Please vote/comment on this issue to voice your support: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/issues...

garaetjjte1y ago

But EOCD is not required to be at the end of file either (well it is, but has stupid comment field).

Dwedit1y ago· 2 in thread

I think there's probably a much more efficient way to pack the correction data than JSON. For example, if you wanted to embed a 10MB video file in there, the correction data would be huge.

In the project there, correction data is used to recover bytes that have been changed into LF when they are actually CR or CRLF.

One idea is to store the correction data as binary, then read two bits every time you see a LF byte. It's either an actual LF, a CR, or a CRLF. The downside is that binary data itself could need correction as well, and encoding nearly 1-bit data in 2 bits is still wasteful (but simple). Packing five 3-state values into a byte is less wasteful and would eliminate forbidden symbols, but is still not optimal.

gildasOP1y ago

You're right, SingleFile (which is capable of saving pages in this format) does a little better than the demo, but it can also be optimized. In fact, I chose the JSON format to keep things as simple and didactic as possible for the presentation. I think I need to use your suggestions to optimize this structure in SingleFile ;)

ElectricalUnion1y ago

I believe at that point (huge blobs compared to small amounts of plaintext strings), it's easier to embed a universal binary web server and have it serve the contents of the zip, like https://redbean.dev/

zzo38computer1y ago· 1 in thread

I would probably prefer to use text other than "Please wait..." since it won't work if JavaScripts are disabled. This can be fixed by changing the text to something such as "This is a HTML/ZIP/PNG polyglot file". And then, omit the <title> to save space.

The URL jar:https://raw.githubusercontent.com/gildas-lormeau/Polyglot-HT... can be used to display the HTML file in some web browsers, although it cannot display the PNG file in this way since it uses # as the URL of the picture.

gildasOP1y ago

A <noscript> script would be even more suitable, but I agree with the principle. I added a link to view the demo without downloading the file, see https://gildas-lormeau.github.io/Polyglot-HTML-ZIP-PNG/demo.... (it was not working previously because GitHub serves pages in UTF-8).

lifthrasiir1y ago· 1 in thread

> The bootstrap page is now encoded in windows-1252, which allows data to be read from the DOM with minimum degradation.

This is not always the case if the encoded content happens to have `-->`, for example. A better approach would be the `<plaintext>` element which can never be closed.

gildasOP1y ago

Indeed, for example the HTML of the files used for the presentation slides [1] use <noframe> tags to keep the HTML well-formed. This point is addressed in the conclusion of the presentation.

[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

nhinck31y ago· 1 in thread

I don't think need any external libraries to do this anymore with DecompressionStream.

creshal1y ago

Thank $DEITY we don't have to care about IE compatibility any more.

EmileSonneveld1y ago· 1 in thread

Could they embed “zip.min.js” too? It is not a single file otherwise

gildasOP1y ago

Thanks, it was an error on my part, now corrected.

porridgeraisin1y ago

> However, there’s a problem: due to the same-origin policy, retrieving ZIP data directly with fetch(””) fails when the page is opened from the filesystem (except in Firefox).

  chromium --allow-access-from-files

j / k navigate · click thread line to collapse

37 comments

24 comments · 9 top-level

OkGoDoIt1y ago· 4 in thread

gildasOP1y ago

Note that if you're on iOS, it's possible that the HTML page doesn't work at all because when it's opened from the filesystem, it's displayed by a viewer which doesn't support JS instead of Safari.

Dwedit1y ago

It's the "Rennes JS User Group" image that you see in the middle of the HTML page.

a1o1y ago

I am also on my phone and found it weird that wasn't a single online demo

gildasOP1y ago

Here is the demo file (cf. the first paragraph and the end of the article): https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

2 more replies

gildasOP1y ago· 3 in thread

Note that you can also take advantage of the fact that a ZIP can be password-protected and make your web page secret! For example https://gildas-lormeau.github.io/private/ (password: "thisisapage").

jclarkcom1y ago

If you are loading external libraries like in this example your encrypted data is at risk. It would be better to include the decryption code directly in the Js or embed Js zlib.

gildasOP1y ago

2 more replies

nhinck31y ago

You can also use the SubtleCrypto API

Retr0id1y ago· 2 in thread

> a bug in “Archive Utility” on macOS prevents it from decompressing the resulting file

By the way, if you're interested in this sort of file format wrangling, check out Ange Albertini's talk tomorrow at 38c3: https://fahrplan.events.ccc.de/congress/2024/fahrplan/talk/Q...

Lammy1y ago

> it's because they check for a "PK" header at the start of the file

garaetjjte1y ago

But EOCD is not required to be at the end of file either (well it is, but has stupid comment field).

Dwedit1y ago· 2 in thread

I think there's probably a much more efficient way to pack the correction data than JSON. For example, if you wanted to embed a 10MB video file in there, the correction data would be huge.

In the project there, correction data is used to recover bytes that have been changed into LF when they are actually CR or CRLF.

gildasOP1y ago

ElectricalUnion1y ago

zzo38computer1y ago· 1 in thread

gildasOP1y ago

lifthrasiir1y ago· 1 in thread

> The bootstrap page is now encoded in windows-1252, which allows data to be read from the DOM with minimum degradation.

This is not always the case if the encoded content happens to have `-->`, for example. A better approach would be the `<plaintext>` element which can never be closed.

gildasOP1y ago

Indeed, for example the HTML of the files used for the presentation slides [1] use <noframe> tags to keep the HTML well-formed. This point is addressed in the conclusion of the presentation.

[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

nhinck31y ago· 1 in thread

I don't think need any external libraries to do this anymore with DecompressionStream.

creshal1y ago

Thank $DEITY we don't have to care about IE compatibility any more.

EmileSonneveld1y ago· 1 in thread

Could they embed “zip.min.js” too? It is not a single file otherwise

gildasOP1y ago

Thanks, it was an error on my part, now corrected.

porridgeraisin1y ago

> However, there’s a problem: due to the same-origin policy, retrieving ZIP data directly with fetch(””) fails when the page is opened from the filesystem (except in Firefox).

  chromium --allow-access-from-files

j / k navigate · click thread line to collapse