You can implement merge sort in place too:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22....
Looks like the algo is more complicated, but surely implementable. In most modern architectures, the key for performance is how one exploits the caching behavior. Quicksort is very cache friendly, as well as the common implementation of merge sort. Not so for heapsort.