At the reception of a packet, if we didn't have a key, we'd punt the packet to userspace, asking it to do an exchange.
For some reason -- That I forget, we ended up putting the entire key into each packet. I think that when the "exchange" happened in userspace, we didn't actually store the entire box and key, instead we just generated an association specific key, and we wanted to be able to rotate this quickly. I think we just came up with the fact that doing the derivation was cheap enough that it'd be better to do it on every message.
What would then happen is that if we didn't have a box cached, we would calculate it, and this code was implemented as a helper function outside of the BPF itself. Then we did Poly1305 + XSalsa20 with a random nonce.
Why? Because IPSec is fucking complicated, and because I can?
Multicore - Yeah, we basically got that for free with BPF/XDP. It's kinda neat. You just load programs, and arrays get processes in parallel. We didn't have much state because we used random nonces, so there was no need for synchronization.
Unfortunately, the code will not be public because the project was closed source, it never shipped, and I'm no longer with the organization that paid for the work.
* BPF - Berkeley Packet Filter - BPF is a safe, in-kernel virtual machine that's meant to be able to introduce next generation functionality into the Kernel: https://lwn.net/Articles/603983/
* ECC - Elliptical Curve Encryption - Enough said
* XDP - eXpress Data Plane - This is a project atop BPF that's being merged into the Kernel to implement programmability features in the kernel, without requiring that people write crazypants code or unsafe code. It's backed, and used by Facebook. If you're familiar with Intel's DPDK, it's similar.
* SKB - Socket Buffers - These are the components that make up the data you send over the wire in the Kernel. When they come out of the kernel, they have to be flattened, and copied which is expensive. This is one of the biggest problems with usermode networking.
Our implementation was not..performant, hence why we abandoned it. Our thought was that we could make it work, but instead we ended up building our own key negotiation / management code atop some other stuff we had in order to setup the IPSec SAs.