I don't really think this is as crazy as the article is implying.
For example, MIPS has no actual "load immediate" instruction. Instead, say you want to load the value 0x12345678 into register t0. You can go ahead and tell your MIPS assembler:
li $t0, 0x12345678
but it will actually emit something like: lui $t0, 0x1234
ori $t0, $t0, 0x5678
"lui" here is the "load upper immediate" instruction, which loads the target register with the 16-bit immediate, left-shifted by 16 bits. "ori $t0, t0, 0x5678" then performs a bitwise OR with its immediate to get the lower 16 bits into the register. This can be done in a single instruction if the immediate is 16 bits or smaller by using the dedicated zero register (a read-only register that always contains 0x00000000, which turns out to be extremely handy for minimizing an instruction set): ori $t0, $zero 0xabcdSome architectures manage to do something more clever than that, as well. See http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/ for instance.
Also there is a 15 bytes instruction length limit that would have to be extended if 64-bit immediates were allowed on all instructions.
It seemed like slightly more of a chore then, with x8664 you can use memory with a lot of instructions..
This can in fact be done, with a memory operand. I'm not sure about the performance compared to a 64-bit load immediate followed by an add, but this will do it (NASM syntax):
add rax, [rel the_constant]
...
the_constant:
dq 8589934592This is what ARM does a lot of times (their 'immediate' value ops allow you to pick an 8 bit number and rotate it a bit)
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....
48 C7 C0 FF FF FF FF mov RAX,0FFFFFFFFhThanks for mentioning that.
The obvious instruction for loading a (64-bit) float into an xmm register is movsd. With a memory source operand, the higher part of the register is zeroed, which is what you want. No problem.
Now the fun part: if the source is not memory but another xmm register, the higher part of the register is not zeroed. This induces a false dependency on the previous value of the destination register that can cause performance issues. To avoid this problem, such register-register copies should be done with a packed move instruction. (Or vmovsd, but that was added much later.)
The obvious packed move instruction for 64-bit floats is movapd, but we can do better than that by using movaps - it is still a float domain instruction but is a byte smaller.
So the optimal way to move a single double from one register to another is to use a vector move of the wrong type.