Compared to what?
Doing epoll manually?
If you care enough, you generally should be able to outdo the reactor and state machines. Whether you should care enough is debatable.
Even just synchronizing on an atomic can thrash branch prediction and L1 caches both, let alone working your way through a task queue and interrupting program flow to do so.
But the moment somebody drops async into my codebase, yay, now I get to pay the cost.
Sure there's no reason to do that, because non-blocking syscalls are just better, but you can…
What you're paying for with async/await is a state machine that describes the concurrent task, but that state machine can be incredibly wasteful in size due to the design of futures and the desugaring pass that converts async/await into the state machine.
That's why I said it's not "zero cost" in the loosest definition of the phrase - you can write a better implementation by hand.