Static IPs for Serverless Containers (opens in new tab)

(modal.com)

125 pointsekzhang1y ago66 comments

66 comments

34 comments · 13 top-level

ekzhangOP1y ago· 6 in thread

Hi! This is a blog post sharing some low-level Linux networking we're doing at Modal with WireGuard.

As a serverless platform we hit a bit of a tricky tradeoff: we run multi-tenant user workloads on machines around the world, and each serverless function is an autoscaling container pool. How do you let users give their functions static IPs, but also decouple them from compute resource flexibility?

We needed a high-availability VPN proxy for containers and didn't find one, so we built our own on top of WireGuard and open-sourced it at https://github.com/modal-labs/vprox

Let us know if you have thoughts! I'm relatively new to low-level container networking, and we (me + my coworkers Luis and Jeffrey + others) have enjoyed working on this.

crishoj1y ago

Neat. I am curious what notable differences there are between Modal and Tailscale.

ekzhangOP1y ago

Thanks. We did check out Tailscale, but they didn't quite have what we were looking for: some high-availability custom component that plugs into a low-level container runtime. (Which makes sense, it's pretty different from their intended use case.)

Modal is actually a happy customer of Tailscale (but for other purposes). :D

1 more reply

xxpor1y ago

You're using containers as a multi-tenancy boundary for arbitrary code?

ekzhangOP1y ago

We use gVisor! It's an open-source application security sandbox spun off from Google. We work with the gVisor team to get the features we need (notably GPUs / CUDA support) and also help test gVisor upstream https://gvisor.dev/users/

It's also used by Google Kubernetes Engine, OpenAI, and Cloudflare among others to run untrusted code.

2 more replies

dangoodmanUT1y ago

As a predominantly rust shop, why choose go for this?

klooney1y ago

Wireguard's premiere user space implementation is in go.

2 more replies

cactacea1y ago· 5 in thread

Static IPs for allowlists need to die already. Its 2024, come on, surely we can do better than this

ekzhangOP1y ago

What would you suggest as an alternative?

sofixa1y ago

JWT/OIDC, where the thing you're authenticating to (like MongoDB Atlas) trusts your identity provider (AWS, GCP, Modal, GitLab CI). It's better than mTLS because it allows for more flexibility in claims (extra metadata and security checks can be done with arbitrary data provided by the identity provider), and JWTs are usually shorter lived than certificates.

2 more replies

thatfunkymunki1y ago

a more modern, zero-trust solution like mTLS authentication

1 more reply

klysm1y ago

Completely agree. IP addresses are almost never a good means of authentication. It results in brittle and inflexible architecture as well. Applications become aware of layers they should be abstracted from

bogantech1y ago

Firewalls exist, many network environments block everything not explicitly allowed.

Authentication is only part of the problem, networks are firewalled (with dedicated appliances) and segmented to prevent lateral movement in the event of a compromise

1 more reply

ATechGuy1y ago· 2 in thread

> Modal has an isolated container runtime that lets us share each host’s CPU and memory between workloads.

Looks like Modal hosts workloads in Containers, not VMs. How do you enforce secure isolation with this design? A single kernel vulnerability could lead to remote execution on the host, impacting all workloads . Am I missing anything?

ekzhangOP1y ago

I mentioned this in another comment thread, but we use gVisor to enforce isolation. https://gvisor.dev/users/

It's also used by Google Kubernetes Engine, OpenAI, and Cloudflare among others to run untrusted code.

yegle1y ago

And Google's own serverless offerings (App Engine, Cloud Run, Cloud Functions) :-)

Disclaimer: I'm an SRE on the GCP Serverless products.

1 more reply

fusjdffddddddds1y ago· 2 in thread

It's going to take years for orgs to adopt IPv6 and mTLS+JWT/OIDC.

Even longer for QUIC/H3.

klysm1y ago

I’m not convinced that mTLS or OIDc are good ideas

fusjdffddddddds1y ago

... Are you going to say why?

1 more reply

eqvinox1y ago· 2 in thread

I guess my first question is, why is this built on IPv4 rather than IPv6...

ekzhangOP1y ago

Yeah, great question. This came up at the beginning of design. A lot of our customers specifically needed IPv4 whitelisting. For example, MongoDB Atlas (a very popular database vendor) only supports IPv4. https://www.mongodb.com/community/forums/t/does-mongodb-atla...

The architecture of vprox is pretty generic though and could support IPv6 as well.

eqvinox1y ago

I guess that works until other customers need access to IPv6-only resources… (e.g.: we've stopped rolling IPv4 to any of our CI. No IPv6, no build artifacts…)

In a perfect world I'd also be asking whether you considered NAT64, but unfortunately I'm well aware that's a giant world of pain to get to work on Linux (involving either out-of-tree Jool, or full-on VPP)

1 more reply

nodesocket1y ago· 2 in thread

Couldn't a NAT instance in-front of containers accomplish this as well (assuming only needed for outbound traffic)? The open source project fck-nat[1] looks amazing for this purpose.

[1] https://fck-nat.dev/stable/

ekzhangOP1y ago

Right, vprox servers act as multiplexed NAT instances with a VPN attached. You do still need the VPN part though since our containers run around the world, in multiple regions and availability zones. Setting the gateway to a machine running fck-nat would only work if that machine is in the same subnet (e.g., for AWS, in one availability zone).

The other features that were hard requirements for us were multi-tenancy and high availability / failover.

By the way, fck-nat is just a basic shell script that sets the `ip_forward` and `rp_filter` sysctls and adds an IP masquerade rule. If you look at vprox, we also do this but build a lot on top of it. https://github.com/modal-labs/vprox

nodesocket1y ago

Ahh that makes sense. I do think that a single fck-nat instance can service multiple AZ's though in a AWS region. Just need to adjust the VPC routing table. Thanks for the reply and info.

jimmyl021y ago· 1 in thread

this is a really neat writeup! the design choice to make each "exit node" control the local wireguard connections instead of a global control plane is pretty neat.

an unfinished project I worked on (https://github.com/redpwn/rvpn) was a bit more ambitious with a global control plane and I quickly learned supporting multiple clients especially anything networking related is a tarpit. the focus on linux / aws specifically here and the results achievable from it are nice to see.

networking is challenging and this was a nice deep dive into some networking internals, thanks for sharing the details :)

ekzhangOP1y ago

Thanks for sharing. I'm interested in seeing what a global control plane might look like, seems like authentication might be tricky to get right!

Controlling our worker environment (like `net.ipv4.conf.all.rp_filter` sysctl) is a big help for us since it means we don't have to deal with the fullness of all possible network configurations.

heinternets1y ago· 1 in thread

So much work seems to go into working around the limitations of IPv4 instead of towards a fully IPv6 capable world.

klysm1y ago

Unfortunately we gotta do both. Overlay networks like wireguard might be a good stepping stone to move software towards IPv6 anyway

qianli_cs1y ago

Thanks for sharing. This new feature is neat! It might sound a bit out there, but here's a thought: could you enable assigning unique IP addresses to different serverless instances? For certain use cases, like web scraping, it's helpful to simulate requests coming from multiple locations instead of just one. I think allowing requests to originate from a pool of IP addresses would be doable given this proxy model.

stuckkeys1y ago

This is just what I needed. Chefs kiss.

klysm1y ago

Why is it important to have a static outbound ip address?

handfuloflight1y ago

Do you block certain ports?

techn001y ago

side question: what do you use to make the diagrams?

j / k navigate · click thread line to collapse

66 comments

34 comments · 13 top-level

ekzhangOP1y ago· 6 in thread

Hi! This is a blog post sharing some low-level Linux networking we're doing at Modal with WireGuard.

We needed a high-availability VPN proxy for containers and didn't find one, so we built our own on top of WireGuard and open-sourced it at https://github.com/modal-labs/vprox

Let us know if you have thoughts! I'm relatively new to low-level container networking, and we (me + my coworkers Luis and Jeffrey + others) have enjoyed working on this.

crishoj1y ago

Neat. I am curious what notable differences there are between Modal and Tailscale.

ekzhangOP1y ago

Modal is actually a happy customer of Tailscale (but for other purposes). :D

1 more reply

xxpor1y ago

You're using containers as a multi-tenancy boundary for arbitrary code?

ekzhangOP1y ago

It's also used by Google Kubernetes Engine, OpenAI, and Cloudflare among others to run untrusted code.

2 more replies

dangoodmanUT1y ago

As a predominantly rust shop, why choose go for this?

klooney1y ago

Wireguard's premiere user space implementation is in go.

2 more replies

cactacea1y ago· 5 in thread

Static IPs for allowlists need to die already. Its 2024, come on, surely we can do better than this

ekzhangOP1y ago

What would you suggest as an alternative?

sofixa1y ago

2 more replies

thatfunkymunki1y ago

a more modern, zero-trust solution like mTLS authentication

1 more reply

klysm1y ago

bogantech1y ago

Firewalls exist, many network environments block everything not explicitly allowed.

Authentication is only part of the problem, networks are firewalled (with dedicated appliances) and segmented to prevent lateral movement in the event of a compromise

1 more reply

ATechGuy1y ago· 2 in thread

> Modal has an isolated container runtime that lets us share each host’s CPU and memory between workloads.

ekzhangOP1y ago

I mentioned this in another comment thread, but we use gVisor to enforce isolation. https://gvisor.dev/users/

It's also used by Google Kubernetes Engine, OpenAI, and Cloudflare among others to run untrusted code.

yegle1y ago

And Google's own serverless offerings (App Engine, Cloud Run, Cloud Functions) :-)

Disclaimer: I'm an SRE on the GCP Serverless products.

1 more reply

fusjdffddddddds1y ago· 2 in thread

It's going to take years for orgs to adopt IPv6 and mTLS+JWT/OIDC.

Even longer for QUIC/H3.

klysm1y ago

I’m not convinced that mTLS or OIDc are good ideas

fusjdffddddddds1y ago

... Are you going to say why?

1 more reply

eqvinox1y ago· 2 in thread

I guess my first question is, why is this built on IPv4 rather than IPv6...

ekzhangOP1y ago

The architecture of vprox is pretty generic though and could support IPv6 as well.

eqvinox1y ago

I guess that works until other customers need access to IPv6-only resources… (e.g.: we've stopped rolling IPv4 to any of our CI. No IPv6, no build artifacts…)

1 more reply

nodesocket1y ago· 2 in thread

Couldn't a NAT instance in-front of containers accomplish this as well (assuming only needed for outbound traffic)? The open source project fck-nat[1] looks amazing for this purpose.

[1] https://fck-nat.dev/stable/

ekzhangOP1y ago

The other features that were hard requirements for us were multi-tenancy and high availability / failover.

nodesocket1y ago

Ahh that makes sense. I do think that a single fck-nat instance can service multiple AZ's though in a AWS region. Just need to adjust the VPC routing table. Thanks for the reply and info.

jimmyl021y ago· 1 in thread

this is a really neat writeup! the design choice to make each "exit node" control the local wireguard connections instead of a global control plane is pretty neat.

networking is challenging and this was a nice deep dive into some networking internals, thanks for sharing the details :)

ekzhangOP1y ago

Thanks for sharing. I'm interested in seeing what a global control plane might look like, seems like authentication might be tricky to get right!

Controlling our worker environment (like `net.ipv4.conf.all.rp_filter` sysctl) is a big help for us since it means we don't have to deal with the fullness of all possible network configurations.

heinternets1y ago· 1 in thread

So much work seems to go into working around the limitations of IPv4 instead of towards a fully IPv6 capable world.

klysm1y ago

Unfortunately we gotta do both. Overlay networks like wireguard might be a good stepping stone to move software towards IPv6 anyway

qianli_cs1y ago

stuckkeys1y ago

This is just what I needed. Chefs kiss.

klysm1y ago

Why is it important to have a static outbound ip address?

handfuloflight1y ago

Do you block certain ports?

techn001y ago

side question: what do you use to make the diagrams?

j / k navigate · click thread line to collapse