“The PDF Association operates under a strict principle—any new feature must work seamlessly with existing readers” followed by introducing compression as a breaking change in the same paragraph.
All this for brotli… on a read-many format like pdf zstd’s decompression speed is a much better fit.
Brotli w/o a custom dictionary is a weird choice to begin with.
That said, I personally prefer zstd as well, it's been a great general use lib.
zstd is Pareto better than brotli - compresses better and faster
Nevertheless, I expect this to be JBIG2 all over again: almost nobody will use this because we've got decades of devices and software in the wild that can't, and 20% filesize savings is pointless if your destination can't read the damn thing.
I have not tried using a dictionary for zstd.
Imagine a sales meeting where someone pitched that to you. They have to be joking, right?
I have no objection to adding Brotli, but I hope they take the compatability more seriously. You may need readers to deploy it for a long time - ten years? - before you deploy it in PDF creation tools.
You're absolutely right! It's not just an inaccurate slogan—it's a patronizing use of artificial intelligence. What you're describing is not just true, it's precise.
brotli decompression is already plenty fast. For PDFs, zstd’s advantage in decompression speed is academic.
Here's discussion by brotli's and zstd's staff:
Something like this:
https://developer.chrome.com/blog/shared-dictionary-compress...
In my applications, in the area of 3D, I've been moving away from Brotli because it is just so slow for large files. I prefer zstd, because it is like 10x faster for both compression and decompression.
So it might land in the spec once it has proven if offers enough value
The standard Brotli dictionary bakes in a ton of assumptions about what the Web looked like in 2015, including not just which HTML tags were particularly common but also such things as which swear words were trendy.
It doesn't seem reasonable to think that PDFs have symbol probabilities remotely similar to the web corpus Google used to come up with that dictionary.
On top of that, it seems utterly daft to be baking that into a format which is expected to fit archival use cases and thus impose that 2015 dictionary on PDF readers for a century to come.
I too would strongly prefer that they use zstd.
The sole exception is if they are restarting the brotli stream for each page, and they are not sharing a dictionary, custom or inferred across the whole doc. Then the dictionary will have to be re-inferred on each page, and then a shared custom dictionary would make more sense.
Am I missing something? Adoption will take a long time if you can't be confident the receiver of a document or viewers of a publication will be able to open the file.
Because I'm doing the work to patch in support across different viewers to help adoption grow. And once the big opensource ones ship it pdfjs, poppler, pdfium, adoption can quickly rise.
>"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. You can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by visiting http://www.adobe.com/go/reader_download. For more assistance with Adobe Reader visit http://www.adobe.com/go/acrreader. Windows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark of Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries."
The (USA) Wisconsin Dept. of Natural Resources has nearly all their regulation PDFs as these XFA non-pdfs that I cannot read. So I cannot know the regulations. My emails about this topic (to multiple addresses over many years a dozen times) have gone unanswered.
If Acrobat supports it it doesn't matter what the spec says. Until Adobe drops XFA from Acrobat and forces these extremely silly people to stop, PDF is no longer PDF.
- when jumping from page to page, you won’t have to decompress the entire file
Okay, so we make a compressed container format that can perform such shenanigans, for the same amount of back-compat issues as extending PDF in this way.
> when jumping from page to page, you won’t have to decompress the entire file
This is already a thing with any compression format that supports quasi-random access, which is most of them. The answers to https://stackoverflow.com/q/429987/5223757 discuss a wide variety of tools for producing (and seeking into) such files, which can be read normally by tools not familiar with the conventions in use.
Far from the same amount:
- existing tools that split PDFs into pages will remain working
- if defensively programmed, existing PDF readers will be able to render PDFs containing JPEG XL images, except for the images themselves.
Though we might still want to restrict the subset of PostScript that we allow. The full language might be a bit too general to take from untrusted third parties.
I suspect PDF was fairly sane in the initial incarnation, and it's the extra garbage that they've added since then that is a source of pain.
I'm not a big fan of this additional change (nor any of the javascript/etc), but I would be fine with people leaving content streams uncompressed and running the whole file through brotli or something.
PDF is also a binary format.
"Brotli is a compression algorithm developed by Google."
They have no idea about Zstandard nor ANS/FSE comparing it with LZ77.
Sheer incompetence.
I just took all PDFs I had in my downloads folder (55, totaling 47M). These are invoices, data sheets, employment contracts, schematics, research reports, a bunch of random stuff really.
I compressed them all with 'zstd --ultra -22', 'brotli -9', 'xz -9' and 'gzip -9'. Here are the results:
+------+------+-----+------+--------+
| none | zstd | xz | gzip | brotli |
+------|------|-----|------|--------|
| 47M | 45M | 39M | 38M | 37M |
+------+------+-----+------+--------+
Here's a table with all the files: +------+------+------+------+--------+
| raw | zstd | xz | gzip | brotli |
+------+------+------+------+--------+
| 12K | 12K | 12K | 12K | 12K |
| 20K | 20K | 20K | 20K | 20K | x5
| 24K | 20K | 20K | 20K | 20K | x5
| 28K | 24K | 24K | 24K | 24K |
| 28K | 24K | 24K | 24K | 24K |
| 32K | 20K | 20K | 20K | 20K | x3
| 32K | 24K | 24K | 24K | 24K |
| 40K | 32K | 32K | 32K | 32K |
| 44K | 40K | 40K | 40K | 40K |
| 44K | 40K | 40K | 40K | 40K |
| 48K | 36K | 36K | 36K | 36K |
| 48K | 48K | 48K | 48K | 48K |
| 76K | 128K | 72K | 72K | 72K |
| 84K | 140K | 84K | 80K | 80K | x7
| 88K | 136K | 76K | 76K | 76K |
| 124K | 152K | 88K | 92K | 92K |
| 124K | 152K | 92K | 96K | 92K |
| 140K | 160K | 100K | 100K | 100K |
| 152K | 188K | 128K | 128K | 132K |
| 188K | 192K | 184K | 184K | 184K |
| 264K | 256K | 240K | 244K | 240K |
| 320K | 256K | 228K | 232K | 228K |
| 440K | 448K | 408K | 408K | 408K |
| 448K | 448K | 432K | 432K | 432K |
| 516K | 384K | 376K | 384K | 376K |
| 992K | 320K | 260K | 296K | 280K |
| 1.0M | 2.0M | 1.0M | 1.0M | 1.0M |
| 1.1M | 192K | 192K | 228K | 200K |
| 1.1M | 2.0M | 1.1M | 1.1M | 1.1M |
| 1.2M | 1.1M | 1.0M | 1.0M | 1.0M |
| 1.3M | 2.0M | 1.1M | 1.1M | 1.1M |
| 1.7M | 2.0M | 1.7M | 1.7M | 1.7M |
| 1.9M | 960K | 896K | 952K | 916K |
| 2.9M | 2.0M | 1.3M | 1.4M | 1.4M |
| 3.2M | 4.0M | 3.1M | 3.1M | 3.0M |
| 3.7M | 4.0M | 3.5M | 3.5M | 3.5M |
| 6.4M | 4.0M | 4.1M | 3.7M | 3.5M |
| 6.4M | 6.0M | 6.1M | 5.8M | 5.7M |
| 9.7M | 10M | 10M | 9.5M | 9.4M |
+------+------+------+------+--------+
Zstd is surprisingly bad on this data set. I'm guessing it struggles with the already-compressed image data in some of these PDFs.Going by only compression ratio, brotli is clearly better than the rest here and zstd is the worst. You'd have to find some other reason (maybe decompression speed, maybe spec complexity, or maybe you just trust Facebook more than Google) to choose zstd over brotli, going by my results.
I wish I could share the data set for reproducibility, but I obviously can't just share every PDF I happened to have laying around in my downloads folder :p
Here's a table with the correct sizes, reported by 'du -A' (which shows the apparent size):
+---------+---------+--------+--------+--------+
| none | zstd | xz | gzip | brotli |
+---------|---------|--------|--------|--------|
| 47.81M | 37.92M | 37.96M | 38.80M | 37.06M |
+---------+---------+--------+--------+--------+
These numbers are much more impressive. Still, Brotli has a slight edge.Something is going terribly wrong with `zstd` here, where it is reported to compress a file of 1.1MB to 2MB. Zstd should never grow the file size by more than a very small percent, like any compressor. Am I interpreting it correctly that you're doing something like `zstd -22 --ultra $FILE && wc -c $FILE.zst`?
If you can reproduce this behavior, can you please file an issue with the zstd version you are using, the commands used, and if possible the file producing this result.
qpdf --stream-data=uncompress in.pdf out.pdf
The resulting file should compress better with zstd.I keep a bunch of comics in PDF but JPEG-XL is by far the best way to enjoy them in terms of disk space.
[1]: https://pdfa.org/wp-content/uploads/2025/10/PDFDays2025-Brea...
But reading the article I realized PDFs have become ubiquitous because of its insistence on backwards compatibility. Maybe for some things it's good to move this slow.
The PDF format is versioned, and in the past new versions have introduced things like new types of encryption. It’s quite probable that a v1.7 compliant PDF won’t open on a reader app written when v1.3 was the latest standard.
If size was important to users then it wouldn't be so common that systems providers crap out huge PDF files consisting mainly of layout junk 'sophistication' with rounded borders and whatnot.
The PDF/A stuff I've built stays under 1 MB for hundreds of pages of information, because it's text placed in a typographically sensible manner.
ISO is pay to play so :shrug:
So your comment is a falsehood
https://pdfa.org/brotli-compression-coming-to-pdf/
> As of March 2025, the current development version of MuPDF now supports reading PDF files with Brotli compression. The source is available from github.com/ArtifexSoftware/mupdf, and will be included as an experimental feature in the upcoming 1.26.0 release.
> Similarly, the latest development version of Ghostscript can now read PDF files with Brotli compression. File creation functionality is underway. The next official Ghostscript release is scheduled for August this year, but the source is available now from github.com/ArtifexSoftware/Ghostpdl.
MuPDF is an excellent PDF reader, the fastest that I have ever tested. There are plenty of big PDF files where most other readers are annoyingly slow.
It is my default PDF and EPUB reader, except that in very rare cases I encounter PDF files which MuPDF cannot understand, when I use other PDF readers (e.g. Okular).