The problem with all of these technologies is that they were invented by different divisions of Microsoft to do different things. That, and Microsoft chasing the Next Big Thing.
What we consider to be "Win32 apps" are built with a framework in USER.dll, which is half reimplementation of the classic MacOS Toolbox API and half a pure-C object oriented class system. It's been here since the beginning, and is the lowest common denominator for getting anything on screen. Every other toolkit eventually opens a USER window, attaches the appropriate window class and wndprocs to it, and then yields CPU control to an event loop that, among other things, contains a Windows USER message pump.
USER, being an object-oriented, pure-C[0] API, is infamously verbose to work with. The "200 line Hello World" example everyone passed around back in the 90s is specifically that verbose because of all the bookkeeping you have to do for USER's sake. It is possible to build USER apps that work well, but it puts a lot of onus on the programmer. Even things like high-DPI support[1] or resizable windows are a pain in the ass because they all have to be implemented manually.
Microsoft's original answer for "USER is too hard" was to adopt Visual Basic or MFC as you mentioned. AFAIK .NET WinForms was also a wrapper around USER. This is why Windows had a cohesive visual appearance all the way through to Windows 7, because everything was just developer-friendly wrappers around common controls. Even third-party widget toolkits could incorporate those controls as subwindows as needed[2].
The problem with USER is that it was built for multiple windows and applications that render (using the CPU!) to a shared surface representing the final visible image. Modern toolkits instead have multiple separate surfaces and draw on them as needed before presenting a final image to a compositor that then mixes other windows together to get a final image. Windows Vista onward has the compositor, but the UI toolkits also need to be surface-aware instead of chucking a bunch of subwindows at DWM at the last minute.
WPF is the first attempt at a modern UI toolkit. Relative to USER resources are replaced with XAML and window classes replaced with... well, actual language classes. Except it was developed by the DevTools division (aka DevDiv), and only ran on .NET with managed code. If you had a native application or just didn't want to pay the cost of having a CLR VM, tough.
Then the iPhone launched. And the iPad launched. The thing is, good tablet UI needs GPU-acceleration up and down the stack, so Microsoft shat themselves, gave the Windows division (aka WinDiv) the keys to the castle, and they completely rewrote WPF in C++ with some fancy language projections. That became "Metro" in Windows 8, then "Modern UI" after a trademark dispute. Microsoft wanted Windows 8 to be a tablet OS, damn it, with full-screen only apps and no third-party app distribution.
And then most people just bought Surface tablets, opened the Desktop "app", and used the same USER apps they were used to, complaining about the Start Screen along the way. So Microsoft pivoted back to a normal desktop with Modern UI apps, which are now called UWP apps, and there's a whole bunch of new glue APIs to let you stick XAML subwindows inside of USER or just use UWP outside of AppX packages, which is what Windows 8 should have done, and now everything is just a mess. WinUI 3 is just an upgrade to the XAML library that UWP apps use, but it sounds like Yet Another Toolkit. MAUI is some kind of meta-toolkit like the old AWT on Java.
At some level, I can explain this, but it's not reasonable. There is no "native" UI toolkit or consistent look-and-feel on Windows anymore. I suspect this, more than anything else, is the reason why Windows killed Aero blur-behind everywhere, and why Electron apps are so damned popular now. HTML and CSS are almost as old as USER, but with consistent engineering support and developer experience.
USER is an enhanced clone of the MacOS API, so it's natural to see what Apple did when confronted with the same problems. MacOS didn't have an object system at all, you just threw a bunch of controls onto a list and the system rendered them. That (along with user mode applications) was actually one of the reasons why they bought NeXT. OSX's AppKit toolkit shipped with compatibility bridges for Toolbox apps, but it was still about as advanced as USER was when it came to GPU usage, given that it was built around the same era as Windows, just for beefier hardware.
So what did Apple do? They made AppKit speak layers. They wrote a whole new compositing system called CoreAnimation to do in-process compositing, with all the common controls knowing how to manage it and layer-unaware third-party controls just doing whatever made sense. And this itself was a trojan horse for UIKit: the compositing library had been written to support a touch tablet demo that was later rolled into the Purple project to produce the iPhone. Y'know, the thing that actually kicked Microsoft's ass so much they decided to fracture their development ecosystem into 40 different UI toolkits with confusing names. In comparison, on modern macOS the big split comes from SwiftUI and Catalyst, but those are both wrappers around AppKit controls rather than ground-up rewrites of UI toolkits nobody dares touch.
[0] Or possibly Pascal, given the MacOS heritage
[1] The correct way to do high-DPI is for the windowing toolkit to work exclusively in virtual coordinates. Physical device coordinates and their derivatives should be converted away from at the earliest possible convenience and converted back into as late as possible. At a minimum, no user-facing APIs should use physical coordinates.
USER does not do this, even though there's an option to make it do this, which has worked wonders on every non-DPI-aware app I've thrown at it.
[2] Or, alternatively, implement their own. My favorite story about this is Internet Explorer, which ships with it's own implementations of common controls specifically just so that HTML form elements don't have to hold an HWND each and can share the parent window.