I am not defending the utter shittiness of the Win32 Processor group API design, which is the usual overly complicated Win32 API that only makes sense to an NT kernel developer. I can see why no application developer bothered to incorporate that in their software. It's basically asking you to do thread scheduling yourself for >64 cores.
But I'm always annoyed to see people whinging about Windows performance when they're using "cross-platform" applications (that are using the lowest common denominator APIs) to supposedly measure perf.
For an application that knows the OS environment it's running in, the "scheduler" or the "kernel" is very unlikely to be an actual impediment most of the time. I use quotes because those words are usually used to mean "my app doesn't run as I expect and it can't be my fault".
This is true of all operating systems.
An old MSDN doc [1] actually says "The reason for initially limiting all threads to a single group is that 64 processors is more than adequate for the typical application.". There is no newer doc that I could find, so perhaps the design hasn't changed in 8 or 10.
"XYZ should be more than adequate for the typical application" is like a Microsoft basic design principle or something :-)
[1]https://docs.microsoft.com/en-us/previous-versions/windows/h...