undefined | Better HN

0 pointsuserbinator2y ago0 comments

The upper limit of instruction density on x86 is much higher than any RISC.

0 comments

4 comments · 2 top-level

rwmj2y ago· 2 in thread

Particularly in the case of this x86 ray tracer, because the core of it is written using the stack-based x87 floating point instructions.

kragen2y ago

in general i find that stack-based instruction sets are more compact than register-based or belt-based instruction sets, despite using almost twice the number of instructions to get anything done, because the individual instructions are smaller. in smalltalk or the jvm or emacs lisp bytecode, most instructions are a single byte, though occasionally you'll have a jump or something that has a following operand byte. (python is much worse at this, because python was never meant to run on 16-bit machines.) on the mup21, each instruction is 5 bits; on the x18, each instruction averages 4½ bits

so, in the space of a single 16-bit thumb or rv32c instruction, you can fit 1.8–2.8 instructions, and that makes a big difference

i compiled some code for rv32ec. here are 8 instructions of it occupying 22 bytes. what would this look like in a stack bytecode?

      38:   00158693      addi    a3,a1,1             a1 1 +                    3 bytecodes
      3c:   43d8          lw      a4,4(a5)            a5 4 + @                  4
      3e:   08e6fa63      bgeu    a3,a4,d2 <.L15>     < if <.L15>               3 bytecode bytes including jump target
      42:   439c          lw      a5,0(a5)            a5 @                      2
      44:   058a          slli    a1,a1,0x2           a1 2 lshift               3
      46:   7179          addi    sp,sp,-48           literal 48 stackframe     3
      48:   97ae          add     a5,a5,a1            +                         1
      4a:   0007a303      lw      t1,0(a5)            @                         1

we can see that in a stack instruction set this would use about 20 bytes, barely less, because the instructions being smaller is offset by their being more numerous. on the other hand, some of the code is occupied with doing things like allocating a stack frame, which usually isn’t necessary on a stack machine

but as far as i can tell on the x87 (which i have never programmed, i'm just going by p. a-1 (176/258) et seq. of http://bitsavers.trailing-edge.com/components/intel/80386/23...) all the instructions are at least two bytes, so i don't see where you get any extra code density

for what it's worth, the subroutine that the above was taken from compiles to 62 instructions and 156 bytes for rv32ec, 61 instructions and 189 bytes for amd64, and 52 instructions and 138 bytes for arm cortex-m4. i'll be compiling it for my own stack-based virtual machine this year but i don't have even a prototype compiler backend yet

kazinator2y ago

There was a Usenix paper examining this.

https://www.usenix.org/legacy/events%2Fvee05%2Ffull_papers/p... [2005]

(Hey, I seem to remember tha an Anton Ertl posts to comp.compilers.)

Spoiler: they claim that with their sophisticated translation from stack to register code, they eliminated 47% of the instructions, and the resulting code is still around 25% larger than the byte code. The size advantage goes to stack-based byte code, but it may not necessarily be as large as you might think.

So more or less in line with your findings or intuition?

1 more reply

kragen2y ago

that hasn't been my experience with instructions i've written myself, but then again, i'm not rrrola

j / k navigate · click thread line to collapse

0 comments

4 comments · 2 top-level

rwmj2y ago· 2 in thread

Particularly in the case of this x86 ray tracer, because the core of it is written using the stack-based x87 floating point instructions.

kragen2y ago

so, in the space of a single 16-bit thumb or rv32c instruction, you can fit 1.8–2.8 instructions, and that makes a big difference

i compiled some code for rv32ec. here are 8 instructions of it occupying 22 bytes. what would this look like in a stack bytecode?

      38:   00158693      addi    a3,a1,1             a1 1 +                    3 bytecodes
      3c:   43d8          lw      a4,4(a5)            a5 4 + @                  4
      3e:   08e6fa63      bgeu    a3,a4,d2 <.L15>     < if <.L15>               3 bytecode bytes including jump target
      42:   439c          lw      a5,0(a5)            a5 @                      2
      44:   058a          slli    a1,a1,0x2           a1 2 lshift               3
      46:   7179          addi    sp,sp,-48           literal 48 stackframe     3
      48:   97ae          add     a5,a5,a1            +                         1
      4a:   0007a303      lw      t1,0(a5)            @                         1

kazinator2y ago

There was a Usenix paper examining this.

https://www.usenix.org/legacy/events%2Fvee05%2Ffull_papers/p... [2005]

(Hey, I seem to remember tha an Anton Ertl posts to comp.compilers.)

So more or less in line with your findings or intuition?

1 more reply

kragen2y ago

that hasn't been my experience with instructions i've written myself, but then again, i'm not rrrola

j / k navigate · click thread line to collapse