undefined | Better HN

0 pointssn417y ago0 comments

I actually sympathize with the theoretician (disclaimer: I work in information-theoretic areas). Information theory is easy to motivate at a first cut, but if you want to really understand it, then there are some hairy issues. There is a lot of slip between the cup and the lip when it comes to information theory (Shannon himself made several serious errors in his original 1948 paper which took decades to fully work out).

Many seemingly "obvious" facts in information theory are tricky to show. Some examples:

(1) From the article: Cross entropy is always greater than or equal to Entropy since we are coding the wrong distribution. How do you show this? For any two probability vectors (p,q), can we say H(p) >= H(p,q)? Any proof I know involves some delicate usage of Jensen's inequality. (By the way, I feel that the notation used by the author is non-standard. H(p,q) usually stands for the joint entropy, which is quite different.)

(2) Another famous fact about entropy : conditioning always reduces entropy - for any two random variables X, Y, we have H(X|Y) <= H(X) and H(Y|X) <= H(Y). This is called Shannon's inequality, and the proof involves a subtle trick.

(3) You can easily show that if p=q, then KL(p||q)=0. But it is also true that if KL(p||q)=0, then p=q. The second fact is quite tricky, and used to appear as a question in Ph.D qualifying exams.

0 comments

4 comments · 2 top-level

FiatLuxDave7y ago· 2 in thread

I have read Shannon's 1948 paper a few times, and while I have noticed one or two errors, I highly doubt that I know all of them. Is there any chance you could point me in the right direction to research the several serious errors you mention? I'd greatly appreciate the pointers.

sn41OP7y ago

A good introductory lecture is by Emre Telatar of EPFL. It's a great talk, and presents a unique view on Shannon's paper. [Of course, I am assuming that _you_ are not Emre :) ] It mentions some of the errors in Shannon's paper:

https://www.youtube.com/watch?v=9FlHZwEpvPE&feature=youtu.be

There are some more errors in his formulation of what eventually came to be known as the Shannon-McMillan-Breiman theorem. These are the errors I know of, there may be more.

The greatness of the paper is its revolutionary conception of a new area ab initio. It contains errors, but that is overshadowed by what it achieved and brought forth.

FiatLuxDave7y ago

Thank you!

srean7y ago

What is delicate about Jensen's inequality in (1) ? As long as you have Radon Nikodyn derivatives that's needed in the definition of KL it seems pretty straightforward.

j / k navigate · click thread line to collapse