For 6502, to get the optimum assembly you'd have to structure your data in structure-of-arrays instead of arrays-of-structures and use indices instead of pointers as much as possible (at least when amount of Ball objects would be < 256).
On line 14, it uses Y, then decrements it to 0, uses it, increments it, uses decrements, uses it, then increment again.. why not perform the indirect load on lines 18 and 26 without the Y index and eliminate lines 16, 21, and 25?
here's my pseudocode:
rc2 <= base of struct
rc4 = rc2 + 4 // addr of dx
rc5 = rc5 + 0 // addr of x
rc6 = *(&rc2+4)
rc4 = *(&rc4+1) // get low byte
rc5 = rc6 + *(&rc2) // add high byte
rc4 = rc4 + *(&rc2+1) // add low byte
rc2 = rc5 // store high result
*(&rc2+1) = rc4 // store low result
I believe it could have done more to do the work in place, but my batt is about to die :(