With the disclaimer that I have zero knowledge of the MacBook Neo hardware, but I do know a bit about GPUs in general (including having written some GPU-accelerated drivers for Windows and the associated cursor-handling code), I'm going to make a wild guess: this lag is caused by waiting for the GPU command queue to flush.
As a bit of background information: the GPU is fed commands from a queue that the CPU writes to. These commands perform the drawing operations that the GPU is designed to accelerate. A hardware cursor is basically a small bitmap that can be positioned anywhere on the screen and moved around by simply updating position registers (which is normally done per mouse interrupt); the hardware draws it automatically. A software cursor is manually drawn by the graphics stack, which saves what was under it, draws the cursor, and then whenever it needs to be moved, writes the original data back, saves the data at the new position, and then draws the cursor there.
Flushing the command queue is necessary when switching to a software cursor, or otherwise doing software writes to the framebuffer, because you need to wait for the GPU to finish drawing what it has queued, or it may end up drawing over what software wants to draw, including the cursor. Or worse, the command is a blit (e.g. scrolling a window) and you end up with remnants of the cursor at its previous position.
All an App needs on MacOS seems to be a binary and a little .plist
The root cause for the issue is probably (I'm not an Apple developer) due to huge round rectangles on the window shape corners. Rendering the window with the corners would include rendering whatever other windows and widgets under the window. (Which will have a lag and some more operations with transparency, which the developers probably want to avoid - while I'm not sure about this part).