> that also requires the decompression algorithm to be similarly aware, which can be a problem if you're distributing the compressed bits widely.
Well not necessarily... An HTML-aware algorithm could for example rearrange attributes in the same order everywhere because it knows it doesn't matter.
Actually that would be a nice addition to the HTML "compressors" out there.
That's a good point. You could have an HTML-aware "precompressor" prepare the HTML for a general-use compression algorithm. However, with end-to-end HTML awareness I think you could do even better.
> Courgette transforms the input into an alternate form where binary diffing is more effective, does the differential compression in the transformed space, and inverts the transform to get the patched output in the original format. With careful choice of the alternate format we can get substantially smaller updates.
I actually do just that with a pre-processor on my site, it's only a single line of ruby with a regex, a split, a sort and a join.
The reason for doing it wasn't so much the compression benefit but some of the nanoc code that generates the site did not always order the tags the same way and then it had to rsync up more than it needed to