http://www.unicode.org/versions/Unicode5.0.0/appC.pdf
See section C.3 for the differences between the UTF-8 encodings. The key paragraph is:
"The definition of UTF-8 in Annex D of ISO/IEC 10646:2003 also allows for the use of five- and six-byte sequences to encode characters that are outside the range of the Unicode character set; those five- and six-byte sequences are illegal for the use of UTF-8 as an encoding form of Unicode characters. ISO/IEC 10646 does not allow mapping of surrogate code positions, known as RC-elements in that standard; that restriction is identical to the restriction for the Unicode definition of UTF-8."
That's where the extra characters come from, as the UTF-8 encoding used by twitter is apparently not checking to see if the characters are valid Unicode characters, as required by Unicode UTF-8. This is the extra restriction referred to in the passage above that is imposed by the Unicode version of UTF-8 but not the ISO version. As quoted in the article, http://en.wikipedia.org/wiki/UTF-8#Description says that the ISO version of UTF-8 can encode 31 bits. I couldn't find a source for the encoding of ISO UTF-8.
The other part I wasn't sure about at first was where the 1,112,064 possible characters figure came from. It turns out that's the 17 Unicode planes of 65,536 characters each, less the range from 0xD000 to 0xDFFF reserved for surrogate pairs.
In other words:
1+0x10ffff-(0xdfff-0xd800+1) = 1 112 064
"Alyssa P Hacker" is a pun on "A Lisp Hacker". She has a friend named Imelda Macros.
It got a lot of attention on Reddit last week (20k visitors).