A finer-grained approach is to have a flag bit that prevents preemption, perhaps even just preemption by threads of the same process. This is weaker than CLI because it doesn't prevent I/O callbacks etc from preempting; ideally those would be suspended as well for the process.
This assume a non-priviledge flag word i.e. user-mode code owns the "process flags", not the kernel.
My favorite solution is a "process signal register" in hardware. Its a wide register full of test-and-set bits, shared by threads of a process. They can be used to implement critical section, semaphore, event, even waiting on a timer. All without a trip thru the kernel - essentially zero-latency kernel primitives.
And "process signal registers", while sounding attractive, aren't really a feasible alternative, given that the number of processes running even on uniprocessors are overwhelming, at least. Plus, if they're beyond CPU control, privilege issues arise again.
In short, yes, there are many alternatives, but the current model works, and not just for x86. And you know what engineers say..