undefined | Better HN

0 pointsfoltik6mo ago0 comments

27us and 1us are both an eternity and definitely not SOTA for IPC. The fastest possible way to do IPC is with a shared memory resident SPSC queue.

The actual (one-way cross-core) latency on modern CPUs varies by quite a lot [0], but a good rule of thumb is 100ns + 0.1ns per byte.

This measures the time for core A to write one or more cache lines to a shared memory region, and core B to read them. The latency is determined by the time it takes for the cache coherence protocol to transfer the cache lines between cores, which shows up as a number of L3 cache misses.

Interestingly, at the hardware level, in-process vs inter-process is irrelevant. What matters is the physical location of the cores which are communicating. This repo has some great visualizations and latency numbers for many different CPUs, as well as a benchmark you can run yourself:

[0] https://github.com/nviennot/core-to-core-latency

0 comments

2 comments · 1 top-level

jstimpfle6mo ago· 1 in thread

I was really asking what "IPC" means in this context. If you can just share a mapping, yes it's going to be quite fast. If you need to wait for approval to come back, it's going to take more time. If you can't share a memory segment, even more time.

foltikOP6mo ago

No idea what this vibe code is doing, but two processes on the same machine can always share a mapping, though maybe your PL of choice is incapable. There aren’t many libraries that make it easy either. If it’s not two processes on the same machine I wouldn’t really call it IPC.

Of course a round trip will take more time, but it’s not meaningfully different from two one-way transfers. You can just multiply the numbers I said by two. Generally it’s better to organize a system as a pipeline if you can though, rather than ping ponging cache lines back and forth doing a bunch of RPC.

j / k navigate · click thread line to collapse