undefined | Better HN

0 pointsdezgeg2y ago0 comments

I think that was partially his point - on 6502, typical-looking C code will be horrifically inefficient, at least when compared to other architectures more suitable for C.

For 6502, to get the optimum assembly you'd have to structure your data in structure-of-arrays instead of arrays-of-structures and use indices instead of pointers as much as possible (at least when amount of Ball objects would be < 256).

0 comments

2 comments · 1 top-level

wk_end2y ago· 1 in thread

Yes, exactly. Were I hand-writing the assembly for the 6502, I'd make all sorts of decisions that the C code doesn't - and that a compiler can't - to make it more efficient.

Instead of passing in a pointer to two separate functions, I'd write a single UpdateBalls procedure that operated on global data. This data is going to be core to my game logic and physics, so I'd put it all on the ZP. As you suggested, "structure-of-arrays". I'd choose a fixed number of balls so I don't need an argument; maybe I'd set my loop to iterate backwards so I get a free zero check with the decrement, maybe I'd unroll the loop ("dead" balls can be placed off-screen with a dx/dy of 0). I'd probably decide that I don't need 16-bit precision for the deltas (how fast could the balls move, really?), and a 16-8 addition is going to be quicker than a 16-16 one.

The compiler isn't going to make these optimizations; that's not a slight against the compiler. In fact, I just checked - the output [0] when I write my C code this way is pretty close to what I'd hand-write. It's roughly a third the number of instructions and - I'm not going to cycle count, so this is a stab in the dark - would take maybe an order of magnitude fewer cycles to run. semu wasn't written with performance on the 6502 in mind, it's not going to have taken considerations like this, so it's going to inevitably be slow when compiled.

[0] https://godbolt.org/z/WYKKeh9b7

mysterymath2y ago

I'd actually like the llvm-mos to do an automated AoS to SoA analysis and rewrite, but haven't gotten around to it yet. There aren't any intrinsic theoretical obstacles I'm aware of though; it's just difficult code to write.

Now that this has come up again as the stock reason "you can't do C well on the 6502", replacing the stack, the zero page, and the register set, I'm probably going to reprioritize it and put the register allocator on pause.

j / k navigate · click thread line to collapse