Looking at the code to that, it looks like 1500 lines of this:
double MaskDcB(double delta) {
PROFILER_FUNC;
static const double extmul = 0.349376011816;
static const double extoff = -0.894711072781;
static const double offset = 0.901647926679;
static const double scaler = 0.380086095024;
static const double mul = 18.0373825149;
static const std::array<double, 512> lut =
MakeMask(extmul, extoff, mul, offset, scaler);
return InterpolateClampNegative(lut.data(), lut.size(), delta);
}
The code has hundreds of high precision constants. Some even seem to be set to nonsensical values (like kGamma to 0.38) Where did all of them come from? The real science here seems to be the method by which those constants were chosen, and I see no details how it was done.A constant lookup table is used for determining the importance of a change vs distance. Seperate tables are used for vertical and horizontal distances (I guess eyes might be slightly more sensitive to vertical edges than horizontal ones?).
Those tables are wildly different in magnitude:
static const double off = 1.4103373714040413; // First value of Y lookup table
static const double off = 11.38708334481672; // First value of X lookup table
However, later on, when those tables are used, another scale factor is used (simplified code): static const double xmul = 0.758304045695;
static const double ymul = 2.28148649801;
The two constant scale factors directly multiply together, so there is no need for both. No human would manually calculate to 10 decimal places a number which had no effect. Hence, my theory is these numbers have been auto-generated by some kind of hill climbing type algorithm.http://disp.ee.ntu.edu.tw/meeting/%E7%B6%AD%E6%AF%85/An%20In...
and also read here:
https://en.wikipedia.org/wiki/YUV
That is my quick guess on how to roughly derive the constants (because it is new, probably there are some fancy modifications tho :) )
I don't know if this code is related to that but just pointing out that seemingly nonsensical constants may appear more than one would thing.
Careful what you wish for!
What are you on these days ? image codecs still ?
The original libjpeg code was written to try and change the Usenet News binary pictures groups over from GIF to JPEG (so that more images would fit down the rather narrow transatlantic pipe that I had at the time). The choice of license turned out to be a good one (it predated the GPL V2) -- who knows what would have happened if we (the precursor to the IJG) had chosen that one.
I believe using a full blown FFT and complex IQA metrics is too much. I have great results with custom quantization matrices, Mozjpeg trellis quantization, and a modification of PSNR-HVS-M, and there's still a lot of room for improvement.
...and generates a solution that uses far less bandwidth, especially after thousands or millions of hits, which is the real point of the exercise.
Cloud computing companies love this. They've got a lot of bored hardware to put to use. It's absolutely no surprise to see solutions like this coming from Google. Spending a dollar of compute time to save $1000 in bandwidth is a no-brainer win for a company with a million servers.
Specifically looking at the cat's eye example, in the bottom of the pupil area there's a bit of green (reflection?) in the lower pupil. In the original it is #293623 (green) - in the libjpeg it is #2E3230 (still green, slightly muted). But in the Guetzil encoded image it is #362C35 - still slightly green but quite close to grey.
In my experience people love to see colors "pop" in photos (and photography is where JPEG excels) - hopefully this is just an outlier and the majority of compressions with this tool don't lose color like this.
I suspect that if you give this algorithm twice the file size as a budget, that green color will come back.
It would definitely have its uses as such. Or maybe it's great all around and I just found one bad example?
TLDR:
> We didn't do a full human rater study between guetzli and mozjpeg, but a few samples indicated that mozjpeg is closer to libjpeg than guetzli in human viewing.
No directly-linkable PDF :P
http://manpages.ubuntu.com/manpages/xenial/man1/netpbm.1.htm...
I can see that if you try to render an incomplete file you might end up "wasting" effort blitting it to the screen and stuff before the rest of the data is decoded. But if thats a concern, one can simply rearrange the data back to scanline order and decode as normal?
I'm still hoping for a Google Now that understands Swiss German :)
(Full disclosure: I am a programmer and I try to match programmers with Zurich's startups for a living.)
So, if you want to move to Zurich, you find my e-mail address in my HN-handle. Read more about Switzerland in my semi-famous blogpost "8 reasons why I moved to Switzerland to work in tech": https://medium.com/@iwaninzurich/eight-reasons-why-i-moved-t...)
Someone out there must have tried this.
[1]:https://blogs.dropbox.com/tech/2016/07/lepton-image-compress...
http://www.streamingmedia.com/Articles/Editorial/-110383.asp...
But I'm still skeptical because HEVC might be more widespread with hardware decoders everywhere and people might just not care enough to move to the new standard. Unless MPEG-LA exploits its dominance really bad with license fees, then we can expect MPEG codecs to die off. Although I think x264 will still live.
[1]: https://fosdem.org/2017/schedule/event/om_av1/attachments/sl...
(4th slide. This codec will be standardized after the experiments are removed and it is frozen to test software.)
As an example of the latter:
I think Opera Mini (which I ran the development of for its first decade) still has somewhere around 150-200 million monthly unique users, down from a peak of 250M. Pretty much all of those users would be quite happy to receive this image quality improvement for free, I think. (Assuming the incremental encoding CPU cost isn't prohibitive for the server farms.) Opera Mini was a "launch user" of webp (smartphone clients only) for this particular reason.
Many of those users devices are Nokia/Sony Ericsson/etc J2ME devices with no realistic way of ever getting system-level software updates. They are still running some circa 2004 libjpeg version to actually decode the images. It's still safe because the transcoding step in Opera Mini means that they aren't exposed to modern bitstream-level JPEG exploits from current web, but it underscores why any improvements targetting formats like JPEG is still quite useful.
Opera Mini for J2ME actually includes a very tiny (like 10k iirc) Java-based JPEG decoder since quite a few devices back then didn't support JPEG decoding inside the J2ME environment. It's better than having to use PNG for everything, but because it's typically like 5x-10x slower than the native/C version even in a great JVM of the time it really only makes sense to use as a fallback.)
There is no way we will start paying royalties to show images on the web.
https://agileblaze.com/google-guetzli-image-compression-setu...
But I get a gflags linking error with 2.1 and 2.2 with -DGFLAGS_NAMESPACE=google. This is atrocious. "google::SetUsageMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)" Without this it works fine.
Guess I still have some old incompat gflags headers around.
EDIT: The fix on macports with 2.2 installed default into /usr/local is: make CXX=g++-mp-6 verbose=0 LDFLAGS="-L/usr/local/lib -L/opt/local/lib -v" CFLAGS=-I/usr/local/include
i.e. enforce gflags 2.2 over the system 2.1
The real problem, as far as I can see, is that JPEG2000 is really slow to decode due to its complexity.
The blog makes it sound like that's the target but the paper has this line:
"Our results are only valid for high-bitrate compression, which is useful for long-term photo storage."
Do the author's think the size/quality benefits still show up when targetting lower bitrates/qualities that are more common on the web? Do they intend to try to prove it?
Another limitation is that Guetzli runs very slowly. This gives a further limiting axis: Guetzli in its current form cannot be applied to a huge corpus of images. Perhaps this covers half of the images on the internet.
So, let's say that Guetzli is 25% relevant to the web pages.
1) YUV420 vs YUV444. Guetzli practically always goes for YUV444.
2) Choosing quantization matrices.
3) After normal quantization, choose even more zeros. JPEG encodes zeros very efficiently.
When doing the above, increase the errors where it matters least (certain RGB values hide errors in certain components, and certain types of visual noise hides other kind of noise).
On top of that, you could tweak the quantized values themselves to make them more compressible.
The more bits you lose during quantization, the more ringing and artifacts you can expect after the IDCT process.
So the tradeoff is quite literally artifacts for smaller size.
this compressor seems to be cleverer about where to lose data than libjpeg.
https://github.com/google/guetzli/releases
and I'm getting "Invalid input JPEG file" from a lot of images unfortunately.
https://github.com/google/guetzli/issues/40#issuecomment-287...
echo Drag and drop one or multiple jpg or png files onto this batch file, to compress with google guetzli using a psychovisual model
if [%1]==[] goto :eof
:loop
echo compressing...
guetzli_windows_x86-64.exe -quality 84 %1 "%~dpn1_guetzlicompressed%~x1"
shift
if not [%1]==[] goto loop
echo DONE
i'm concerned about the color changes that are clearly visible throughout the whole image.