But that's without any rendering, and non interacting particles which are extremely boring unless you like fireworks. (You can add a term like v[] += g
dt for free.) And you don't need to store colors for your particles if you can compute the colors from the particle number with a function.Rasterizing is slower, because each pixel of the image might get touched by multiple particles (which mean concurrent accesses in the GPU to the same memory address which they don't like).
Obtaining the screen coordinates is just a matrix multiply, but rendering the particles in the correct depth order requires multiple pass, atomic operations, or z-sorting. Alternatively you can slice your point clouds, by mixing them up with a peak-shaped weight function around the desired depth value, and use an order independent reduction like sum, but memory accesses are still concurrent.
For the rasterizing, you can also use the space partitioning indices of the particle to render to a part of the screen independently without concurrent access problems. That's called "tile rendering". Each tile render the subset of particles which may fall in it. (There are plenty of literature in the Gaussian Splatting community).