While I'm similarly bitter at how things have gone effectively worse, the actual driver path, assuming same user-space code, involved a significant increase in complexity for good reasons.
Back in Windows NT 4.0 - 5.3 days, GDI would draw direct to VRAM. Fast, simple, but prone to rendering glitches that left corrupted screen unless something would redraw the area.
The amount of pixels was way lower - and we already were using a lot of vector fonts at the time anyway. With higher resolutions, even when you scale by integer value, you need a way faster blitter, and new caching and rendering methods. While GDI had reasonably good hooks for caching, they don't necessarily map well with GPU architecture, and on-GPU blitting is way different than old Windows 2D acceleration architecture that worked fine with GDI - and lower resolutions.
Both for security reasons and to prevent glitches, and honestly to also handle caching & rendering better in modern GPU, you need indirect system between GDI and GPU. Once you have shaders rendering windows contents as texture on triangle strip of 2 triangles, adding blur or transparency was close to zero cost unless you have really resource constrained system (and then you had other issues, really).
And windows had to track exact Z-order since Windows 2.0 introduced overlapping windows, otherwise painter algorithm & gdi caching got confused (it was used to know exactly when to send MSG_PAINT to what window and with what params).
Animations are iffy thing, but usability research suggests that some animations, especially in a world where computers are often very silent and have no indicators (neither HDD sounds nor activity LED, for example), is indeed necessary to help majority of users know when computer is "doing something" or just hung.
As for the last point... I agree 200%.