I looked into this in the past, it's because they check for a "PK" header at the start of the file - which is of course not actually required. I assumed it was deliberate because it does exclude most "weird" ZIPs.
By the way, if you're interested in this sort of file format wrangling, check out Ange Albertini's talk tomorrow at 38c3: https://fahrplan.events.ccc.de/congress/2024/fahrplan/talk/Q...
Lots of FOSS tooling will have a similar limitation due to the lack of support in the shared-mime-info spec for reading identifying features from the ends of files. Please vote/comment on this issue to voice your support: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/issues...
In the project there, correction data is used to recover bytes that have been changed into LF when they are actually CR or CRLF.
One idea is to store the correction data as binary, then read two bits every time you see a LF byte. It's either an actual LF, a CR, or a CRLF. The downside is that binary data itself could need correction as well, and encoding nearly 1-bit data in 2 bits is still wasteful (but simple). Packing five 3-state values into a byte is less wasteful and would eliminate forbidden symbols, but is still not optimal.
The URL jar:https://raw.githubusercontent.com/gildas-lormeau/Polyglot-HT... can be used to display the HTML file in some web browsers, although it cannot display the PNG file in this way since it uses # as the URL of the picture.
This is not always the case if the encoded content happens to have `-->`, for example. A better approach would be the `<plaintext>` element which can never be closed.
[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...
chromium --allow-access-from-files