> Thought there was a limit for work that can be done concurrently by the CPU,
Right. But in an IO bound scenario, the CPU isn't doing work; it's waiting on IO. So, because threads are generally heavy, you don't want a ton of them, taking up memory, doing nothing.
But, when you have lightweight threads, you can spin up one per connection. This ends up being simpler, and you don't have the large memory usage. This is what nginx does, in a sense. It still has a worker per core, but each of those workers can handle thousands of requests simultaneously, because it's all non-blocking.
That limit to concurrent work is exactly why non-blocking architectures are so important, and task systems fit into them really nicely.