When I asked why the rationale was something like, "If a spinlock is ever held more than such and so number of microseconds, it's an error that it's a spinlock and actually it should be a mutex." Instead of doing degenerate behavior in this case, they just made the spinlock actually act like a mutex.
And once this happens, you now have a descheduled thread that owns the lock and a spinning thread waiting to acquire it, which is a recipe for priority inversion. As far as the kernel can tell, it’s the spinning thread that’s doing some important work so it has no reason to preempt it and schedule the waiting thread, which is exactly the wrong thing to do in this situation.
As the term is pervasively used in industry, "spinlocks" always imply locking against other physical hardware, and that can't be done without spinning (I mean, you could fall back to blocking, but then the blocking implementation in the kernel would have to lock its own data structures to effect the scheduling; it's a chicken and egg problem). They usually, but not quite always, imply protection against asynchronous interrupts or signals too.
[1] https://www.kernel.org/doc/html/latest/locking/locktypes.htm...
[2] https://github.com/apple/darwin-libplatform/blob/main/src/os...
In kernel-level code, for areas of the kernel that must use real spinlocks, they do not fallback to the equivalent of a Futex because that would mean the current thread of execution would be swapped out and kernel-level spinlocks are designed for critical sections that cannot be swapped out. In critical sections of the kernel that can be swapped out, you'd use a Mutex which does the equivalent of a user-level Futex (and swap out) when the Mutex is contended.
If you're facing such low-latency constraints that you need a spinlock, you wouldn't want the thread to sleep, instead you'd pin the core and have the dedicated core constantly spinning. That's how it's done in HFT. You can use a fancy network card like SolarFlare with efvi_drivers to bypass the kernel completely for network IO (spin waiting on the network card). Basically you need to treat the kernel like it's a paedophile trying to molest your children, and do everything you can to stop it touching your low-latency threads.
That could be a wait queue, or it could be keeping track of what lock (if any) a thread is blocked on and iterating all threads to see if anything is blocked (this is relatively easy to implement, but not great if you have many threads), or ???. If you're running SMP or a re-entrant kernel, you need to be careful about how you do your checks so you don't have race conditions; you don't want a thread blocked on an unlocked lock.