Well in most cases I've looked at generated assembly (not that often), the xmm registers are used even for scalar operations, which I thought was the default option for gcc on x86-64, but I suppose it might differ on different systems (or perhaps 32-bit mode was used for some reason).