I'm not making this up on my own either. Please check out http://en.wikipedia.org/wiki/Entropy_%28information_theory%2.... Let me quote the important part: The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.[2]
Consider the following example: You have a wordlist of 100 000 words. It seems only normal that log(100 000)/log(2) is equal to 16.6 bits of entropy. Now consider you take three words out of that list completely at random. You get the words "a no we". Assuming 16.6 bits of entropy per word you do indeed have to search through a space of 49.8 bits but only if you attack that via the dictionary
It is clear that in this case you can do a different attack. Instead of brute forcing the words you can brute force the characters on their own with a search space of a-z and space. This equals log(27^7)/log(2) or 33.2 bits. A lot less than 49.8 bits estimated when only considering a dictionary approach. In reality English text is so predictable that you don't have to search even close to 33.2 bits of entropy if you brute force it with an algorithm that is aware of English text. Assuming Shannon's 1.3 bits per character estimate this password has 9.1 bits of entropy.
I understand that this is an edge case with very short words. But I choose that to try and show that there are other ways to attack the password by using a 27 character dictionary. This is cold hard math and therefore much easier to accept than the magic entropy estimation of englist text. Once you see that this way can reduce your entropy calculation it's not that hard to accept that there might be more ways to reduce the entropy ever further.
Re 1: words still consist of characters Re 2: Certainly correct, but to ignore the possibility of English words having less entropy than it appears at first is odd given the patterns English words often follow.
I'm interested in reading more about those entropy estimations, can you recall where you read about it? According to Applied Cryptography Shannon states that entropy per letter decreases as the text grows. Shannon estimates 2.3 bits per letter for chunks of 8 letters but it drops down to between 1.3 and 1.5 bits per character for 16 character chunks.
Applied Cryptography cites a paper by Shanon called "Predication and Entropy in Printed English" in the Bell System Technical Journal from 1951. I Have not personally read it yet but will try to find it in the near future.