Still cheap: you only need to preempt the threads which are actively running user code. If a coroutine is ready to run, but not actually running, you don't have to do anything with it (as long as you check for safepoints before entering user code.) That means your safepoints cost is `O(os threads currently running user code)` which in most runtimes is `O(num cores)`