I've actually been hacking on a similar FOSS project lately, with a focus on building what I'm calling a layer 3 service mesh for the edge. More or less came out of my learned hatred for managing mTLS at scale and my dislike for shoving everything through a L7 proxy (insane protocol complexity, weird bugs, and you still have the issue of authenticating you are actually talking to the proxy you expect).
Last week I got the first release of the userspace router shipped, worth taking a look if you want to play around with a completely userspace and unprivileged WireGuard compatible VPN server.
https://github.com/noisysockets/nsh/blob/main/docs/router.md
https://github.com/google/gvisor/tree/go
go get gvisor.dev/gvisor/pkg/tcpip@go
The go branch is auto generated with all of the generated code checked in.
> really difficult to keep the version of gVisor I was using up to date
For our project, we update gvisor whenever Tailscale does.
I don't know the status on those export tools these days as I left the company years ago, but if they could sync with a different branch.
This would help various folks quite a bit, as for example tsnet users often fall into the trap of trying to do `go get -u`, which then pulls a non-functional gvisor version.
I think the solution is an automatically exported repository at a different path. Kind of (or maybe exactly) like what Tailscale/bradfitz used to maintain.
Unlike, say, GitHub Codespaces, running something like this on your own infra means your incentives and Coder.com's are aligned, i.e. both of you want to reduce your cloud costs (as opposed to, say, GitHub running on Azure gives them an opportunity and incentive to mark up on Azure cloud costs).
We’ve tried to align our pricing with the value of the product. In small teams the productivity gains seem to be much lower, so we target Enterprise!
But exfiltrating data with a userspace VPN is totally fine?
I'm also wondering why not use TLS.
The reason you'd use WireGuard rather than TLS is that it allows you to talk directly to multiple services, using multiple protocols (most notably, things like Postgres and Redis) without having to build custom serverside "gateways" for each of those protocols.
And then you're suddenly in a whole world of pain because all of this is driven by a stack of byzantine certifications (half of which, as usual, are bogus, but that doesn't help you), and your network stack has none of them.
(Written from first-hand experience.)
Pretty much the only thing you can do is somewhat filter out known-bad, not directly motivated outbound traffic, such as malware payloads with very clear signatures. This only works if it's "not directly motivated", because as soon as there's a person who wants to do it, they can skirt around it again.
> We are committed to keeping your data safe through end-to-end encryption and to making Coder easy to run across a wide variety of systems from client laptops and desktops to VMs, containers, and bare metal. If we used the TCP implementation in the OS, we’d need a way for the TCP packets to get from the operating system back into Coder for encryption. This is called a TUN device in unix-style operating systems and creating one requires elevated permissions, limiting who can run Coder and where. Asking for elevated permissions inside secure clusters at regulated financial enterprises or top secret government networks is at best a big delay and at worst a nonstarter.
The specific part that’s unclear is why encryption needs to be applied at the TCP layer and at that point if they need it at the transport layer why they’re not using something like QUIC which has a much more mature user-space implementation.
As I understand the only reason you'd use a TUN interface is if you want to send/receive raw IP packets. Their marketing doesn't make it very clear what their product does, but I can't see a reason it would need to send/receive raw IP packets rather than TCP/UDP packets over a specific port...
I surmise that the reason might be that a user space tunnel might be faster (like maybe they can do UDP over TCP or something to gain speed improvements).
Good post nevertheless.
> large multiple performance decrease per dollar spent
Gvisor helps you offer multi-tenant products which can be actually much cheaper to operate and offer to customers, especially when their usage is lower than a single VM would require. Also, a lot of applications won't see big performance hits from running under Gvisor depending on their resource requirements and perf bottlenecks.
Their performance documents you linked claim vs runc: 20-40x syscall overhead, half of redis' QPS, and a 20% increase in runtime in a sample tenserflow script. Also google "CloudRun slow" and "Digital Ocean Apps slow", both are Gvisor.
Literally anything else.
System call overhead does matter, but it’s not the ultimate measure of anything. If it were, gVisor with the KVM platform would be faster than native containers (looking at the runsc-kvm data point which you’ve ignored for an unknown reason). But it is obviously more complex than that alone. For example, let’s click down and ask — how is it even possible to be faster? The default docker seccomp profile itself installs an eBPF filter that slows system calls by 20x! (And this path does not apply within the guest context.) On that basis, should you start shouting that everyone should stop using Docker because of the system call overhead? I would hope not, because looking at any one figure in isolation is dumb — consider the overall application and architecture. Containers themselves have a cost (higher context switch time due to cgroup accounting, costs to devirtualize namespaces in many system calls, etc.) but it’s obviously worth it in most cases.
The redis case is called out as a worst case — the application itself does very little beyond dispatching I/O, so almost everything manifests as overhead. But if you’re doing something that has 20% overhead, you need hard security boundaries, and fine-grained multi-tenancy can lower costs by 80% it might make perfect sense. If something doesn’t work for you because your trade-offs are different, just don’t use it!
But given this article is about improving gvisors userland tcp performance significantly, it seems like the netstack stuff causes major performance losses too.
I saw a github link in another top article today https://github.com/misprit7/computerraria where the Readme's Pitch section feels very relevant to gvisor.
The netstack stuff here has nothing to do with the rest of gVisor.
You'll note their node/ruby benchmarks showed a substantially bigger performance hit. That's because the other gvisor sandboxing functionality (general syscall + file I/O) has more of an impact on performance, but also because these are network-processing bound applications (rare) that were still reaching high QPS in absolute terms for their perspective runtimes (do you know many real-world node apps doing 350qps-800qps per instance?).
Because coder is not likely to be bottlenecked by CPU availability for networking, the resource overhead should be inconsequential, and what's really important is the impact on user latency. But that's something likely on the order of 1ms for a roundtrip that is already spending probably 30-50ms at best in transit between client and server (given that coder's server would be running in a datacenter with clients at home or the office), plus the actual application logic overhead which is at best 10ms. And that's very similar to a lot of gvisor netstack use cases which is why it's not as big of a deal as you think it is.
TLDR: For the stuff you'd actually care about (roundtrip latency) in the coder usecase the perf hit of using gvisor netstack should be like 2% at most, and most likely much less. Either way it's small enough to be imperceivable to the actual human using the client.
But after I left, I heard a that alot of the poor performance of Cloud Run is just plain old oversubscribed shared core e2 stuff.
Google engs recently rewrote the GSO bit, but unlike Tailscale, it is only for TCP, though.
Besides, gvisor has had "software" & "hardware" GSO support for as long as I can remember.
This is approximately the case for any alternative IP stack you might pick though, a mature IP stack is a huge undertaking with all the many flavors of enhancements to IP and particularly TCP over the years, the high variance in platform behaviors and configurations and so on.
In general you should only take on a dependency of a lesser-used IP stack if you're willing to retain or train IP experts in house over the long haul, because as is demonstrated here, taking on such a dependency means eventually you'll find a business need for that expertise. If that's way outside of your budget or wheelhouse, it might be worth skipping.
I see an explanation in their blog about avoiding TUN devices since they require elevated permissions, but why would you need a TUN device to send data to/from an application? I can't understand what their product does from the marketing material but it doesn't look like it would require constructing raw IP packets instead of TCP/UDP packets and letting the OS wrap them in the other layers.
Portable is a bit of a weird word here because for many of us with gray beards the word means architectures, kernels and systems, but I think in this context it tends to more mean "can run just as easily on my macbook as in a cloud container", but in practice the software isn't that portable, as Go isn't that portable - at least not in the context of vs. a niche C "portable network stack" that would build roughly anywhere that there's a working C toolchain, which is almost everywhere.
Constant security fixes for the kernel are a real pain in deployments unless you follow upstream kernels closely. If your business is in shipping Linux runtimes with a high packing density, you really need to find ways to minimize the exposed Linux surface area, or organize to be able to ship kernel upstream updates at an extremely high frequency (relative to normal infrastructure upgrade rates for kernels / mandatory reboots) (and I would not consider kexec safe in this kind of context, at all).
An alternative approach might be firecracker / microvms and so on, but those have their own tradeoffs too. The core point is that you want more than one layer between the host machines and the user code that wants to interact with Linux features.
> we’d need a way for the TCP packets to get from the operating system back into Coder for encryption.
yes, this is commonly done via OpenSSL for example.
> This is called a TUN device in unix-style operating systems and creating one requires elevated permissions
waitasec, wut? sure you could use a TUN device I guess, but assuming some kind of multi-tenant separation is an underlying assumption they didn't mention in their intro, couldn't you also use cgroup'd containers? sorry if I'm not fluent in the terminology.
i'm struggling to understand the constraints that push them towards gVisor. simply needing to do encryption doesn't seem like justification. i'm sure they have very good reasons, but needing to satisfy a financial regulator seems orthogonal at best. i would just like to understand those reasons.
† I don't think? I didn't see them say that, and we do the same thing and we don't create raw sockets.