> Is this claim actually true?
Yes, it is -- but it depends on how we define "clipped".
> My understanding is that if you take a waveform and clip it, the resulting waveform actually carries less energy (think of the corresponding integral)
Only if the clipping reduces the peak value. If you compare a sinewave with a peak value of 1, and a square wave with a peak value of 1, the square wave has a substantially higher average level (with a ratio of pi / 2).
> It's this – the unexpectedly large amount of high-frequency energy – that kills speakers because their crossover networks push it into the tiny, tiny tweeters, and they are utterly unprepared for it.
Yes -- the rate at which the speaker cones are required to move is an additional factor. But for a "clipping" definition that clips by means of trying to exceed the available voltage, these two effects add.
http://i.imgur.com/oE5NFZ9.png
In the above linked image, the red trace is sin(x), the integral for the interval 0 < x < pi is 2. The green trace produces an integral of pi. The ratio of the two is pi/2, and the speaker power difference is (pi/2)^2 = 2.46 (because the speaker's power is the square of the applied voltage).
The green trace is what you would get if you simply turned up the volume beyond any reasonable setting -- the amplifier produces a clipped version of the sine wave and the peak value is equal to the supply voltage.