If your work is mostly compute, then you usually don't really want to run more concurrency than one, maybe a few workers per core, and then OS scheduling is easy. If your work is more of waiting for i/o, large concurrency makes more sense, but the OS scheduling is not going to be too hard there, because it takes almost nothing for the OS to leave a process blocked on i/o; but you do need to have good timer scalability if you have a lot of processes, since they're all going to want to set and clear a timeout on most of the syscalls. io_uring etc with a small number of os processes/threads might be less work for the kernel, but certainly at the cost of isolation.