I wonder if io_uring could be used to issue a single syscall that would read data from disk (actually using page cache) and send it on the network.
Of course, you could use DPDK or similar technologies to do the opposite - read the data from disk once and keep it in user-space buffers, then write it directly to NIC memory without another syscall. That should still theoretically be faster, since there would be 0 syscalls per request, where the other approach would require 1 per request.