Indeed, one thing I’ve always wondered is if you can submit a read request for a page aligned buffer and have the kernel arrange for data to be written directly into that without any additional copies. That’s probably not possible since there’s routing happening in the kernel and it accumulates everything into sk_buffs.
But maybe it could arrange for the framing part of the packet and the data to be decoupled so that it can just give you a mapping into the data region (maybe instead of you even providing a buffer, it gives you back an address mapped into your space). Not sure if that TLB update might be more expensive than a single copy.
Parity with synchronous programming is an explicit goal of Rust async declared many times (e.g. see here https://github.com/rust-lang/rust-project-goals/issues/105). I agree with your rant about the illusion of synchronicity, but it does not matter. The synchronous abstraction is immensely useful in practice and less leaky it is, the better.
To me it's pretty clear that parity in the issue referenced refers to equivalence parity - that is you can accomplish the tasks in some way, not that it's a drop-in replacement. I haven't seen anywhere suggested that async lets you write synchronous code without any changes, nor that integrating completion-style APIs with asynchronous will yield code that looks like synchronous. For one, completion-style APIs are for performance and performance APIs are rarely structured for simplicity but to avoid implicit costs hidden in common leaky (but simpler) abstractions. For another, completion-style APIs in synchronous programming ALSO looks different from epoll/select-like APIs, so I really don't understand the argument you're trying to make.
EDIT:
> You have an inevitable overhead of managing the owned buffer when compared against simply passing mutable borrow to an already existing buffer. Imagine if `io::Read` APIs were constructed as `fn read(&mut self, buf: Vec<u8>) -> io::Resul<Vec<u8>>`.
I'm imaging and I don't see a huge problem in terms of the overhead this implies. And you'd probably not necessarily take in a Vec directly but some I/O-specific type since such an API would be for performance.
It makes sense to ask the ring wrapper for memory that you can emplace your payload into before submitting the IO if you want to use zero-copy.