I've actually been hacking on a similar FOSS project lately, with a focus on building what I'm calling a layer 3 service mesh for the edge. More or less came out of my learned hatred for managing mTLS at scale and my dislike for shoving everything through a L7 proxy (insane protocol complexity, weird bugs, and you still have the issue of authenticating you are actually talking to the proxy you expect).
Last week I got the first release of the userspace router shipped, worth taking a look if you want to play around with a completely userspace and unprivileged WireGuard compatible VPN server.
https://github.com/noisysockets/nsh/blob/main/docs/router.md
https://github.com/google/gvisor/tree/go
go get gvisor.dev/gvisor/pkg/tcpip@go
The go branch is auto generated with all of the generated code checked in.
I don't know the status on those export tools these days as I left the company years ago, but if they could sync with a different branch.
This would help various folks quite a bit, as for example tsnet users often fall into the trap of trying to do `go get -u`, which then pulls a non-functional gvisor version.
Unlike, say, GitHub Codespaces, running something like this on your own infra means your incentives and Coder.com's are aligned, i.e. both of you want to reduce your cloud costs (as opposed to, say, GitHub running on Azure gives them an opportunity and incentive to mark up on Azure cloud costs).
We’ve tried to align our pricing with the value of the product. In small teams the productivity gains seem to be much lower, so we target Enterprise!
But exfiltrating data with a userspace VPN is totally fine?
I'm also wondering why not use TLS.
The reason you'd use WireGuard rather than TLS is that it allows you to talk directly to multiple services, using multiple protocols (most notably, things like Postgres and Redis) without having to build custom serverside "gateways" for each of those protocols.
And then you're suddenly in a whole world of pain because all of this is driven by a stack of byzantine certifications (half of which, as usual, are bogus, but that doesn't help you), and your network stack has none of them.
(Written from first-hand experience.)
Pretty much the only thing you can do is somewhat filter out known-bad, not directly motivated outbound traffic, such as malware payloads with very clear signatures. This only works if it's "not directly motivated", because as soon as there's a person who wants to do it, they can skirt around it again.
> We are committed to keeping your data safe through end-to-end encryption and to making Coder easy to run across a wide variety of systems from client laptops and desktops to VMs, containers, and bare metal. If we used the TCP implementation in the OS, we’d need a way for the TCP packets to get from the operating system back into Coder for encryption. This is called a TUN device in unix-style operating systems and creating one requires elevated permissions, limiting who can run Coder and where. Asking for elevated permissions inside secure clusters at regulated financial enterprises or top secret government networks is at best a big delay and at worst a nonstarter.
The specific part that’s unclear is why encryption needs to be applied at the TCP layer and at that point if they need it at the transport layer why they’re not using something like QUIC which has a much more mature user-space implementation.
> large multiple performance decrease per dollar spent
Gvisor helps you offer multi-tenant products which can be actually much cheaper to operate and offer to customers, especially when their usage is lower than a single VM would require. Also, a lot of applications won't see big performance hits from running under Gvisor depending on their resource requirements and perf bottlenecks.
Their performance documents you linked claim vs runc: 20-40x syscall overhead, half of redis' QPS, and a 20% increase in runtime in a sample tenserflow script. Also google "CloudRun slow" and "Digital Ocean Apps slow", both are Gvisor.
Literally anything else.
But given this article is about improving gvisors userland tcp performance significantly, it seems like the netstack stuff causes major performance losses too.
I saw a github link in another top article today https://github.com/misprit7/computerraria where the Readme's Pitch section feels very relevant to gvisor.
Google engs recently rewrote the GSO bit, but unlike Tailscale, it is only for TCP, though.
Besides, gvisor has had "software" & "hardware" GSO support for as long as I can remember.
This is approximately the case for any alternative IP stack you might pick though, a mature IP stack is a huge undertaking with all the many flavors of enhancements to IP and particularly TCP over the years, the high variance in platform behaviors and configurations and so on.
In general you should only take on a dependency of a lesser-used IP stack if you're willing to retain or train IP experts in house over the long haul, because as is demonstrated here, taking on such a dependency means eventually you'll find a business need for that expertise. If that's way outside of your budget or wheelhouse, it might be worth skipping.
I see an explanation in their blog about avoiding TUN devices since they require elevated permissions, but why would you need a TUN device to send data to/from an application? I can't understand what their product does from the marketing material but it doesn't look like it would require constructing raw IP packets instead of TCP/UDP packets and letting the OS wrap them in the other layers.
> we’d need a way for the TCP packets to get from the operating system back into Coder for encryption.
yes, this is commonly done via OpenSSL for example.
> This is called a TUN device in unix-style operating systems and creating one requires elevated permissions
waitasec, wut? sure you could use a TUN device I guess, but assuming some kind of multi-tenant separation is an underlying assumption they didn't mention in their intro, couldn't you also use cgroup'd containers? sorry if I'm not fluent in the terminology.
i'm struggling to understand the constraints that push them towards gVisor. simply needing to do encryption doesn't seem like justification. i'm sure they have very good reasons, but needing to satisfy a financial regulator seems orthogonal at best. i would just like to understand those reasons.
† I don't think? I didn't see them say that, and we do the same thing and we don't create raw sockets.