undefined | Better HN

0 pointsdemallien14y ago0 comments

Yeah, no. If I have a dictionary of 100000 words, then each word represents about 17 bits of entropy. If I have three words, that makes 3 x 17 = 51 bits of entropy.

0 comments

3 comments · 1 top-level

xkcdentropy14y ago· 2 in thread

You deny the fact that English text can be attacked separately from your dictionary. English text is very predictable, for example e is much more common and q is almost certainly followed by u.

I'm not making this up on my own either. Please check out http://en.wikipedia.org/wiki/Entropy_%28information_theory%2.... Let me quote the important part: The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.[2]

Consider the following example: You have a wordlist of 100 000 words. It seems only normal that log(100 000)/log(2) is equal to 16.6 bits of entropy. Now consider you take three words out of that list completely at random. You get the words "a no we". Assuming 16.6 bits of entropy per word you do indeed have to search through a space of 49.8 bits but only if you attack that via the dictionary

It is clear that in this case you can do a different attack. Instead of brute forcing the words you can brute force the characters on their own with a search space of a-z and space. This equals log(27^7)/log(2) or 33.2 bits. A lot less than 49.8 bits estimated when only considering a dictionary approach. In reality English text is so predictable that you don't have to search even close to 33.2 bits of entropy if you brute force it with an algorithm that is aware of English text. Assuming Shannon's 1.3 bits per character estimate this password has 9.1 bits of entropy.

I understand that this is an edge case with very short words. But I choose that to try and show that there are other ways to attack the password by using a 27 character dictionary. This is cold hard math and therefore much easier to accept than the magic entropy estimation of englist text. Once you see that this way can reduce your entropy calculation it's not that hard to accept that there might be more ways to reduce the entropy ever further.

demallienOP14y ago

1) the predictability of the distribution of characters in the English language has nothing to do with this type of password - the symbols aren't characters, but words. 2) that figure of entropy per character of 1.3 bits per character only applies to English text, and the figure is low because there are a bunch of small words, like "and" and "the" that are regularly repeated. The entropy per character for words containing 6 letters or more, not arranged in sentences is a lot higher, like about double if I recall correctly. So sure, just as I can expect to get brut-forced if I choose a pin of 0000, I can get brute forced if I choose a passphrase of 'and the in'. Good luck forcing "queens examine faulty charges" though.

xkcdentropy14y ago

I am merely suggesting that the entropy can be less than what is estimated by looking only at the dictionary.

Re 1: words still consist of characters Re 2: Certainly correct, but to ignore the possibility of English words having less entropy than it appears at first is odd given the patterns English words often follow.

I'm interested in reading more about those entropy estimations, can you recall where you read about it? According to Applied Cryptography Shannon states that entropy per letter decreases as the text grows. Shannon estimates 2.3 bits per letter for chunks of 8 letters but it drops down to between 1.3 and 1.5 bits per character for 16 character chunks.

Applied Cryptography cites a paper by Shanon called "Predication and Entropy in Printed English" in the Bell System Technical Journal from 1951. I Have not personally read it yet but will try to find it in the near future.

1 more reply

j / k navigate · click thread line to collapse