0: https://www.nayuki.io/page/optimal-text-segmentation-for-qr-...
In general, better compression means output that looks more like "randomness"—any redundancy implies there was room for more compression—and that figure makes this quite clear visually!
I used this for digital QR code tickets [2], and it made the codes so much easier to scan, even with bad lighting.
[1] https://news.ycombinator.com/item?id=39094251
[2] https://workspace.google.com/marketplace/app/qr_code_ticket_...
64bit chunks are a little bit worse, with 4.16% overhead, so it might be worth dealing with the little complexity of 63 bit chunks.
I would also output the decimal digits in little-endian order.
edit: If you are willing to go for larger chunks then 93bit chunks would be my next candidate, there the overhead is 0.36%, barely more than pure base10's 0.34%. I don't think it's worth going any higher.
<html><body><script>document.body.innerHTML = decodeURI(window.location.hash.substring(1))</script></body></html>
So you can point to https://srv.us/d#<h1>Demo</h1>https://github.com/Chia-Network/hsms/blob/main/hsms/util/qri...
Anyone who wants to use this, feel free.
"',/[]\
Incidentally, this also makes them JSON-Safe.Base94 uses all printable characters, and Base122 uses both printable characters and whitespace.
UUIDs encoded in various alphabets:
len algo value
24 Base64 padded wScmB8cVS/K05Wk+nORR8Q==
22 Base64 unpadded osnQ3DUDTDuUQBc9mBRYFw
20 Base85 rHoLuTk%W0fgpY+`c>xc
20 Base94 d(+H"Q/hP}i}d9<KeAt)%
18 Base122 @#FoALt`92vSt@ qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=eyJ0IjoiY292aWQxOV9idXNpbmVzcyIsImJpZCI6IjEyMTMyMSIsImJuYW1lIjoiVGVzdCBOU1cgR292ZXJubWVudCBRUiBjb2RlIiwiYmFkZHJlc3MiOiJCdXNpbmVzcyBhZGRyZXNzIGdvZXMgaGVyZSAifQ==
vs qrencode -t UTF8 https://www.service.nsw.gov.au/campaign/service-nsw-mobile-app?data=072685680885510189821994892577900638215789419258463239488533499278955911240512279111633336286737089008384293066931974311305533337894591404330656702603998035920596585517131555967430155259257402711671699276432408209151397638174974409842883898456527289026013404155725275860173673194594939
The latter one is actually smaller. TILI'm not sure that qrencode CLI tool will automatically do this for you.
> In a URL, the rest of the URL is not purely numeric, so actually seeing the benefits of this encoding requires using two segments:
> * one with the “boring” bits of the URL at the start, likely using the Binary mode
> * one with the big blob of base10 data, using the Numeric mode
If I'm looking at the correct repository, it does [1].
[1] https://github.com/fukuchi/libqrencode/blob/master/split.c
Figure 8 and its surrounding section are the undamaged case.
> I also tested only one background image, so the behaviour may differ greatly with QR codes contained in different surrounds.
This likely does not matter much. It could theoretically affect binarization near the edges of the code (near module boundaries, depending on how you did the resizing), but in practice as long as the code itself is high-contrast, this is unlikely. The more usual issue is that real images often do not have a proper quiet zone around the code, but that is mostly going to be irrelevant for what you are trying to test here.
> The QR codes are generated to be perfectly rectangular and aligned to the image pixel grid, which is unlikely to happen in the real world.
This is a much bigger deal. A large source of decoding errors for larger versions (for a fixed "field of view") is due to alignment / sampling issues. A lot of work goes into trying to find where the code is in the image and identify the grid pattern, and that is just inherently less reliable for larger versions, particularly if there is projective distortion (so the module size is not constant). The periodic alignment patterns try to keep the number of parameters that can be used to fit this grid roughly constant relative to the number of modules in the grid, but locating those patterns is itself error-prone and subject to false positives (they are not nearly as unique-looking as finder patterns), and the initial global transform estimate has to get pretty close for them to work. I am actually happy that damaging these was not causing you more trouble. This is definitely somewhere that ZBar can be improved. It currently does not use the timing patterns at all, for example. I'm not actually aware of an open-source QR decoder that does.
(I'm the original author of ZBar's QR decoder)
Huffman is probably simple enough. The typical approach is adaptive Huffman, which doesn't compress the initial data very well since it needs to adjust to actual character frequencies. So that wouldn't work well for QR codes since they're short.
But you can start adaptive Huffman with a pre-agreed initial tree (as static Huffman does), which would give good compression from the start. There could be several standard pre-agreed Huffman trees, and instead of using bits in the QR code to select a character set, those bits could select one of a few pre-agreed initial Huffman trees.
Of course, pure binary (byte) encoding is best, but many systems have the constraint of text characters or non-control characters. With that constraint, alphanumeric encoding is best.
I sort of assumed this was common knowledge, but I guess not.
I've added an analysis of many more bases to the article: https://huonw.github.io/blog/2024/03/qr-base10-base64/#fn:ot...
Let's assume... this: https://news.ycombinator.com/reply?id=39907672&goto=item%3Fi...
The special encoding is just about sending data to the backend?
2. I'd have called this base-1000. It's using 3-digit numbers encoded into 10 bits. Base64 doesn't encode into 64 bits, it uses 64 characters encoded into 6 bits. And this encoding uses 000 to 999, encoded into 10 bits. But that messes up the title when you compare apples to apples, 1000 > 64 is just obvious and true.
But yes, you're right, it would be reasonable to think of this as encoding the bytes in base 1000, where each "digit" just happens to be shown to humans as 3 digits.
We know only two open source JS projects that even support alphanumeric, Nayuki and Paul’s. https://github.com/Cyphrme/QRGenJS. We have it hosted here: https://cyphr.me/qrgen
I’ve also done a lot of work on this problem: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa...
Also, regarding alphanumeric, RFC 3986 states that:
> An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow “HTTP” as well as “http”)
base8 in numeric mode: 8 input bits -> 3 digits -> 10 output bits, 25% overhead
base32 in alphanumeric mode: 5 input bits -> 1 character -> 5.5 output bits, 10% overhead
I would prefer base32 out of these too, but it's interesting that even base8 beats base64 here.
0: https://en.wikipedia.org/wiki/QR_code#Information_capacity
https://i.imgur.com/cAVbqka.png
Because of quirks, in edge cases decimal is more efficient, but overall alphanumeric is better in QR code.
Abstractly, it requires approximately log(45)/log(16) output bits per input bit, an overhead of 37%.
Making this more concrete: each input byte is encoded as two hex digits, and two hex digits have to be encoded as two Alphanumeric characters. It thus takes 11 bits in the QR code bit stream to store 8 bits of input.
https://spec.smarthealth.cards/#encoding-qrs
It's well supported by scanners but can create unwieldy values for users to copy/paste.
For more recent work with dynamic content (and the assumption that a web server is involved in the flow), we're just limiting the payload size and using ordinary byte mode (https://docs.smarthealthit.org/smart-health-links/spec)
2 alphanumerics (=4000 links) is plenty to encode a link to all the major pages of your website/service you may want to advertise. 10 alphanumerics (=10^18) is plenty that even if every person in the world had a QR code, nobody could guess one before hitting your rate limiter.
The user experience gained by fast reliable scanning is far greater than that enabled by slightly improved offline support (offline functionality requires that the user already has your app installed, and in that case, you could preload any codes that user had access to).
Wouldn’t this break Deep/Universal links which send a user directly to a specific location within an app?
I get that there are potential security/privacy concerns, but if you are in full control of URL schemes, isn’t that purpose of this feature?
More importantly, it's enough links that at the Landauer limit a collision can't happen without consuming ~300,000 solar systems of energy, vastly beyond human technological ability. With this property, each link can also be considered private.