For a hardware accelerator that may not be so bad but if you are trying to separate the HSM via the network, to minimize what an attacker can do after compromising the httpd, then every packet loss or an outage or network delay (or, more generally, any latency) would block the whole httpd.
Cloudflare proclaimed a while ago that they had a way to do essentially this (with nginx+openssl) and they said their solution was non-blocking, but they did not publish the code as far as I can tell.
I think if one wanted to solve this problem properly, larger architectural changes to openssl would be necessary. Please correct me if I'm wrong!
EDIT: Also, if you move out the RSA operation, ideally you'd want to distribute the work over more than one CPU core. If the operation is synchronous, you can't really do that.
For Keyless SSL, it is necessary to make RSA operations asynchronous, since the operations are requested over the TCP network (which may have big delays).
OTOH Neverbleed degelates the operations within the same server using Unix sockets. So there is no fear of such delays. And the server spawn a dedicated thread to each client thread. In other words, the delay is practically _no worse_ than what it is without Neverbleed.
And discussing _how worse_ it is, calculations related to TLS handshakes may block the server for a few milliseconds. It may sound bad, but generally speaking it is negligible comparing to the latency over a public network.
Many servers are multi-threaded, but many are not. Using the proposed technique in a Node.js process, or nginx, is going to severely limit the number of new connections per second.
> Q. How much is the overhead?
> Virtually none.
> On my Linux VM running on Core i7 @ 2.4GHz (MacBook Pro 15" Late 2013)...
Would love to see it on a high-end system that's primarily doing termination.I wish all security improvements were as simple and easy.