Sand boxing isn't done for performance reasons, and it's why you can disable it in bitwig. The sole purpose of sand boxing a plugin in its own process is because it is the only way to catch a segfault and prevent a shared library from crashing a host.
My experience from actually writing low latency schedulers in user space as well as the publicly available material - like in Ardour - suggests different conclusions from yours.
Keep in mind that a naive benchmark like "cpu usage" is entirely meaningless. What you look at is round trip latency required for a threshold of underruns/missed deadlines. Threading requires additional latency, and process synchronization even more. While I'm sure you report fewer underruns when splitting off into sandboxed plugins I'm suspicious if it's hitting the same performance as doubling or tripling the buffer size in terms of latency in the first place.