Meanwhile, in networking, there is RDMA, which indeed allows memory to be transferred over the network without involving the CPU on either end.
But even ignoring those things, memory bandwidth is actually a big reason why you should not want to transform things upfront. You see, if you have an upfront transformation, you're streaming the data into the CPU, then back out in the new format, and then reading it back in again later on when you actually use it. If your data is small enough, perhaps it all stays in cache and this isn't an issue. But if it's large, then you're doubling or tripling your memory bandwidth usage by having a decode step.
In any case, the fact is that there are lots of real servers out there that are CPU bound and spend double-digit percentages of their time encoding and decoding Protobufs.