Myth. Performance won't be better. Scaling arguably is better, but usually the use-case doesn't require the level of scaling where async is superior to OS threads.
E.g. go ahead and implement a RPC server which e.g. only has to deal with 10 concurrent requests - then measure latencies. The synchronous version might be faster, due to not requiring any epoll calls. The different might get even bigger if e.g. the server is serving static files, and you are measuring throughput - the synchronous version will likely provide higher performance since no extra context-switch from the async-runtime-of-your-choice to threadpool-for-file-io thread and back is required.
You are also right in that once one moves beyond a certain scale the async version might offer better performance. But the scale that is required would be different per application, and not every application requires the scale.
It certainly isn't like you use a green thread model and you unconditionally throw away a 5x performance factor or something.
There are absolutely cases where that does matter. To name just one, a game engine would not want to throw away that level of performance out of the box. (That's the game engine user's job, to "spend" the quality of the game engine on their task.) But I think there's a lot more programmers who have, without analysis, assumed they're in that class and made a lot of decisions based on that, when in fact they are plural orders of magnitude away from it. To pick a number out thin air, 4 full CPU cores running Rust code that someone has at least glanced at and spent a bit of time optimizing is a loooooot of power.
(The closest current comparison is Rust vs. Go, but Rust works much harder at compile-time optimization and doesn't have GC, and I expect those two things account for the majority of the delta between them, with Go being greenthreaded being non-trivial, but in the clear minority. Stay tuned for Java with Project Loom versus Rust, which has its own rather major differences but will at least be another relevant data point.)