The context switch for threads remains very expensive. You have 4,000 threads but that's lots of different processes spinning up their own threads. it's still more efficient to have one thread per core for a single computational problem, or at most one per CPU thread (often 2 threads per core now). You can test this by using something like rayon or GNU parallel using more threads than you have cores. It won't go faster, and after a certain point, it goes slower.
The async case is suited to situations where you're blocking for things like network requests. In that case the thread will be doing nothing, so we want to hand off the work to another task of some kind that is active. Green threads mean you can do that without a context switch.