This is a provocative framing but I'm not sure it makes sense. Functions aren't resources; they don't have throughput or utilization. It would be bad if a core could only call the function 300-600 times per second, but that is why we have async programming models, lightweight threads, etc. So that the core can do other stuff during the waiting-on-IO slices of the timeline. Which, as you mention, dominate.
It would also be bad if a user had to wait on 300-600 sequential RPCs to get back a single request, but like... don't do that. Remote endpoints are not for use in tight loops. There are cases where pathological architectures lead to ridiculous fanout/amplification, but even then we are usually talking about parallel tasks.
There is overhead to doing things remotely vs. locally. But the waiting isn't the interesting part. It's serialization, deserialization, copying, tracking which tasks are waiting, etc. A lot of performance work goes on around these topics! Compact and efficient binary wire protocols, zero-copy network stacks, epoll, green threads, async function coloring schemes, etc. The upshot of this work is also, as is typical in web/enterprise backend world, not so much about the latency of individual requests (those are usually simple) but about the number of concurrent requests/users you can serve from a given hardware footprint. That is normally what we're optimizing for. It's a different set of constraints vs. few but individually expensive computations. So of course the solution space looks different too.
No comments yet.