We're Leaving Kubernetes (opens in new tab)

(gitpod.io)

517 pointsfiliptronicek1y ago333 comments

333 comments

Personally - just let the developer own the machine they use for development.

If you really need consistency for the environment - Let them own the machine, and then give them a stable base VM image, and pay for decent virtualization tooling that they run... on their own machine.

I have seen several attempts to move dev environments to a remote host. They invariably suck.

Yes - that means you need to pay for decent hardware for your devs, it's usually cheaper than remote resources (for a lot of reasons).

Yes - that means you need to support running your stack locally. This is a good constraint (and a place where containers are your friend for consistency).

Yes - that means you need data generation tooling to populate a local env. This can be automated relatively well, and it's something you need with a remote env anyways.

---

The only real downside is data control (ie - the company has less control over how a developer manages assets like source code). I'm my experience, the vast majority of companies should worry less about this - your value as a company isn't your source code in 99.5% of cases, it's the team that executes that source code in production.

If you're in the 0.5% of other cases... you know it and you should be in an air-gapped closed room anyways (and I've worked in those too...)

shriek1y ago

And the reason they suck is the feedback loop is just too high as compared to running it locally. You have to jump through hoops to debug/troubleshoot your code or any issues that you come across between your code and output of your code. And it's almost impossible to work on things when you have spotty internet. I haven't worked on extremely sensitive data but for PII data from prod to dev, scrubbing is a good practice to follow. This will vary based on the project/team you're on of course.

ethbr11y ago

Aka 'if a developer knew beforehand everything they needed, it wouldn't be development'

marcosdumay1y ago

That's the least important problem.

The developers also lack knowledge about the environment; can't evolve the environment; can't test the environment for bugs; and invariably interfere with each other because it's never isolated well. And also, yes, it adds lag.

Anyway, yes, working locally on false data that little resemblance to production still beats remote environments.

czhu121y ago

We tried this approach at a former company with ~600 engineers at the time.

Trying to boot the full service on a single machine required every single developer in the company installing ~50ish microservices on their machine, for things to work correctly. Became totally intractable.

I guess one can grumble about bad architecture all day but this had to be solved. we had to move to remote development environments which restored everyone’s sanity.

Both FAANG companies I’ve worked at had remote dev environments that were built in house.

chipdart1y ago

> Trying to boot the full service on a single machine required every single developer in the company installing ~50ish microservices on their machine, for things to work correctly. Became totally intractable.

This is certainly one of the critical mistakes you did.

No developer needs to launch half of the company's services to work on a local deployment. That's crazy, and awfully short-sighted.

The only services a developer ever needs to launch locally are the ones that are being changed. Anything else they can consume straight out of a non-prod development environment. That's what non-prod environments are for. You launch your local service locally, you consume whatever you need to consume straight from a cloud environment, you test the contract with a local test set, and you deploy the service. That's it.

> I guess one can grumble about bad architecture all day but this had to be solved.

Yes, it needs to be solved. You need to launch your service locally while consuming dependencies deployed to any cloud environment. That's not a company problem. That's a problem plaguing that particular service, and one which is trivial to solve.

> Both FAANG companies I’ve worked at had remote dev environments that were built in house.

All FANG companies I personally know had indeed remote dev environments. They also had their own custom tool sets to deploy services locally, either in isolation or consuming dependencies deployed to the cloud.

This is not a FANG cargo cult problem. This is a problem you created for yourself out of short-sightedness and for thinking you're too smart for your own good. Newbies know very well they need to launch one service instance alone because that's what they are changing. Veterans know that too well. Why on earth would anyone believe it's reasonable to launch 50 services to do anything at all? Just launch the one service you're working on. That's it. If you believe something prevents you from doing that, that's the problem you need to fix. Simple. Crazy.

3 more replies

rileymat21y ago

I may be misunderstanding, but wouldn't you want the particular microservice you are working on independent enough to develop locally, then deploy into the remote environment to test the integration? (I don't work at this scale)

3 more replies

wlll1y ago

That's ~12 people per microservice. I understand that it might be hard to boot and coordinate all that on each developers machine, and that local docker images/vms/whatever might not suit you, but is it possible that this is a problem of your companies own creation?

1 more reply

justinclift1y ago

> installing ~50ish microservices on their machine

Ouch. Where they using macOS at the time with laptops having not-enough-ram?

I've seen that go poorly on macOS with java based microservices. Largely due to java VMs wanting ram pre-assigned for each, which really chews though ram that mostly sits around unused.

This was a few years ago though, at the tail end of Intel based mac's where 32GB ram in a mac laptop wasn't really an option.

3 more replies

gchamonlive1y ago

You need an option to enable only the services they need to develop locally and automatically configure them to talk to the other services hosted elsewhere.

1 more reply

jen201y ago

> Both FAANG companies I’ve worked at had remote dev environments that were built in house.

This is certainly not universal among FAANGs though.

Requiring 50 services to be up is absolutely nuts, but it’s actually pretty trivial using something like Nomad locally.

eloisant1y ago

In a past job I've had a good experience in this case with docker compose (well, something similar).

You would list the services you need (or service groups) in a config file, start a command and all services would start in containers. Sure, you need a lot of RAM with that but on 32Gb it was working fine.

1 more reply

0xbadcafebee1y ago

Most teams/products I have been involved in, the stack always grows to the point that a dev can no longer test it on their own machine, regardless of how big the machine is. And having a different development machine than production leads to completely predictable and unavoidable problems. Devs need to create the software tooling to make remote dev less painful. I mean, they're devs... making software is kind of their whole thing.

hosh1y ago

I have used remote dev machines just fine, but my workflow vastly differs from many of my coworkers: terminal-only spacemacs + tmux + mosh. I have a lot of CLI and TUI tools, and I do not use VScode at all. The main GUI app I run is a browser, and that runs locally.

I have worked on developing VMs for other developers that rely on a local IDE such. The main sticking point is syncing and schlepping source code (something my setup avoids because the source code and editor is on the remote machine). I have tried a number of approaches, and I sympathize with the article author. So, in response to "Devs need to create the software tooling to make remote dev less painful. I mean, they're devs... making software is kind of their whole thing." <-- syncing and schlepping source code is by no means a solved problem.

I can also say that, my spacemacs config is very vanilla. Like my phone, I don't want to be messing with it when I want to code. Writing tooling for my editor environment is a sideshow for the work I am trying to finish.

2 more replies

lmm1y ago

> Most teams/products I have been involved in, the stack always grows to the point that a dev can no longer test it on their own machine, regardless of how big the machine is.

It doesn't have to be like that. I've worked on a 10MLOC codebase with 500+ committers - all perfectly runnable locally, on admittedly slightly beefy dev machines. It's true that systems will grow without limit unless some force exists to counter this, but keeping your stack something you can sanely run on a development machine is well worth spending some actual effort on.

1 more reply

kgeist1y ago

We have a project which spawns around 80 Docker containers and runs pretty OK on a 5 year old Dell laptop with 16GB RAM. The fans run crazy and the laptop is always very hot but I haven't noticed considerable lags, even with IntelliJ running. Most services are written in Go though and are pretty lightweight.

1 more reply

arp2421y ago

> the stack always grows to the point that a dev can no longer test it on their own machine

So the solution here is to not have that kind of "stack".

I mean, if it's all so big and complex that it can't be run on a laptop then you almost certainly got a lot of problems regardless. What typically happens is tons of interconnected services without clear abstractions or interfaces, and no one really understands this spaghetti mess, and people just keep piling crap on top of it.

This leads to all sorts of problems. Everywhere I've seen this happen they had real problems running stuff in production too, because it was a complex spaghetti mess. The abstracted "easy" dev-env (in whatever form that came) is then also incredibly complex, finicky, and brittle. Never mind running tests, which is typically even worse. It's not uncommon for it all to be broken for every other new person who joins because changes somewhere broke the setup steps which are only run for new people. Everyone else is afraid to do anything with their machine "because it now works'.

There are some exceptions where you really need a big beefy machine for a dev env and tests, maybe, but they're few and far between.

2 more replies

stephenr1y ago

> the stack always grows to the point that a dev can no longer test it on their own machine

Sounds like you have a different problem.

CPU resources required to run your stack should be very minimal if it's a single user accessing it for local testing idle threads don't consume oodles of cpu cycles to do nothing.

Memory use may be significant even in that case (depending on your stack) but let's be realistic. If your stack is so large that it alone requires more memory than a dev machine can spare with an IDE open, the cost of providing developers with capable workstations will pale in comparison to the cost of running the prod environment.

I have a client whose prod environment is 2x load balancer; 2x app server; 3x DB cluster node - all rented virtual machines. We just upgraded to higher spec machines to give headroom over the next couple of years (ie most machines doubled the RAM from the previous generation).

My old workstation bought in 2018 had enough memory that it could virtualise the current prod environment with the same amounts of RAM as prod, and still have 20GB free. My current workstation would have 80+ GB free.

In 95% of cases if you can't run the stack for a single user testing it, on a single physical machine, you're doing something drastically wrong somewhere.

maxrecursion1y ago

"Most teams/products I have been involved in, the stack always grows to the point that a dev can no longer test it on their own machine"

Isn't this problem solved by CICD? When the developer is ready to test, they make a commit, and the pipeline deploys the code to a dev/test environment. That's how my teams have been doing it.

2 more replies

pylua1y ago

Really? I can't imagine not running the code locally. Honestly, my company has a micro services architecture, and I will just comment out the docker-compose pieces that I am not using. If I am developing/testing a particular component then I will enable it.

How tightly coupled are these systems?

rossjudson1y ago

The stack(factory) must grow.

lotharcable1y ago

I strongly recommend just switching the Dev environment over to Linux and taking advantage of tools like "distrobox" and "toolbx".

https://github.com/89luca89/distrobox

https://containertoolbx.org/

It is sorta like Vagrant, but instead of using virtualbox virtual machines you use podman containers. This way you get to use OCI images for your "dev environment" that integrates directly into your desktop.

https://podman.io/

There is some challenges related to usermode networking for non-root-managed controllers and desktop integration has some additional complications. But besides that it has almost no overhead and you can have unfettered access to things like GPUs.

Also it is usually pretty easy to convert your normal docker or kubernetes containers over to something you can run on your desktop.

Also it is possible to use things like Kubernetes pods definitions to deploy sets of containers with podman and manage it with systemd and such things. So you can have "clouds of containers" that your dev container needs access to locally.

If there is a corporate need for window-specific applications then running Windows VMs or doing remote applications over RDP is a possible work around.

If everything you are targeting as a deployment is going to be Linux-everything then it doesn't make a lot of sense to jump through a bunch of hoops and cause a bunch of headaches just to avoid having it as workstation OS.

trog1y ago

If you're doing this, there are many cases where you might as well just spin up a decent Linux server and give your developers accounts on that? With some pretty basic setup everyone can just run their own stuff within their own user account.

You'll run into occasional issues (e.g. if everyone is trying to run default node.js on default port) but with some basic guardrails it feels like it should be OK?

I'm remembering back to when my old company ran a lot of PHP projects. Each user just had their own development environment and their own Apache vhost. They wrote their code and tested it in their own vhost. Then we'd merge to a single separate vhost for further testing.

I am trying to remember anything about what was painful about it but it all basically Just Worked. Everyone had remote access via VPN; the worst case scenario for them was they'd have to work from home with a bit of extra latency.

1 more reply

h4ck_th3_pl4n3t1y ago

This.

Distrobox and podman are such a charm to use, and so easily integrated into dev environments and production environments.

The intentional daemon free concept is so much easier to setup in practice, as there's no fiddly group management necessary anymore.

Just a 5 line systemd service file and that's it. Easy as pie.

csweichel1y ago

OP here. There definitely is a place for running things on your local machine. Exactly as you say: one can get a great deal of consistency using VMs.

One of the benefits of moving away from Kubernetes, to a runner-based architecture , is that we can now seamlessly support cloud-based and local environments (https://www.gitpod.io/blog/introducing-gitpod-desktop).

What's really nice about this is that with this kind of integration there's very little difference in setting up a dev env in the cloud or locally. The behaviour and qualities of those environments can differ vastly though (network bandwidth, latency, GPU, RAM, CPUs, ARM/x86).

michaelt1y ago

> The behaviour and qualities of those environments can differ vastly though (network bandwidth, latency, GPU, RAM, CPUs, ARM/x86).

For example, when you're running on your local machine you've actually got the amount of RAM and CPU advertised :)

1 more reply

justinclift1y ago

Something that's not clear from the post is whether you're running these environments on your own hardware, or layering things on top of something from a cloud provider (AWS, etc)?

master_crab1y ago

Hi Christian. We just deployed Gitpod EKS at our company in NY. Can we get some details on the replacement architecture? I’m sure it’s great but the devil is always in the details.

edplmajor1y ago

Need middleware libs that react to eBPF data and signal app code to scale up/down forks in their own memory VM, like V8

Kubernetes is another mess of userspace ops tools. Userspace is for composable UI not backend. Kube and Chef and all those other ops tools are backend functionality being used like UI by leet haxxors

pmarreck1y ago

In my last role as a director of engineering at a startup, I found that a project `flake.nix` file (coupled with simply asking people to use https://determinate.systems/posts/determinate-nix-installer/ to install Nix) led to the fastest "new-hire-to-able-to-contribute" time of anything I've seen.

Unfortunately, after a few hires (hand-picked by me), this is what happened:

1) People didn't want to learn Nix, neither did they want to ask me how to make something work with Nix, neither did they tell me they didn't want to learn Nix. In essence, I told them to set the project up with it, which they'd do (and which would be successful, at least initially), but forgot that I also had to sell them on it. In one case, a developer spent all weekend (of HIS time) uninstalling Nix and making things work using the "usual crap" (as I would call it), all because of an issue I could have fixed in probably 5 minutes if he had just reached out to me (which he did not, to my chagrin). The first time I heard them comment their true feelings on it was when I pushed back regarding this because I would have gladly helped... I've mentioned this on various Slacks to get feedback and people have basically said "you either insist on it and say it's the only supported developer-environment-defining framework, or you will lose control over it" /shrug

2) Developers really like to have control over their own machines (but I failed to assume they'd also want this control over the project dependencies, since, after all, I was the one who decided to control mine with the flake.nix in the first place!)

3) At a startup, execution is everything and time is possibly too short (especially if you have kids) to learn new things that aren't simple, even if better... that unfortunately may include Nix.

4) Nix would also be perfect for deployments... except that there is no (to my knowledge) general-purpose, broadly-accepted way to deploy via Nix, except to convert it to a Docker image and deploy that, which (almost) defeats most of the purpose of Nix.

I still believe in Nix but actually trying to use it to "perfectly control" a team's project dependencies (which I will insist it does do, pretty much, better than anything else) has been a mixed bag. And I will still insist that for every 5 minutes spent wrestling with Nix trying to get it to do what you need it to do, you are saving at least an order of magnitude more time spent debugging non-deterministic dependency issues that (as it turns out) were only "accidentally" working in the first place.

sgarland1y ago

People just straight-up don’t want to learn. There are always exceptions, of course, but IME the majority of people in tech are incurious. They want to do their job, and get paid. Reading man pages is sadly not in that list.

3 more replies

lemme_tell_ya1y ago

> time is possibly too short (especially if you have kids) to learn new things that aren't simple, even if better

Having a kid has drastically altered my ability to learn new things outside of work, simply due to lack of time. I never could have imagined how big of an impact having a kid would be, its crazy!

The worst thing is when you actually manage to carve out some time to do some learning or experimentation with a new tool, library, etc only to find out that it sucks or you just don't have the time to pick up or whatever.

1 more reply

kalaksi1y ago

From my perspective, installing Nix seems pretty invasive. I can understand if someone doesn't want to mess with their system "unnecessarily" especially if the tool and it's workings are foreign. And I can't really remember the last time I had issues with non-deterministic dependencies either. Dependency versions are locked. Maybe I'm missing something?

3 more replies

rtpg1y ago

I trialed for a job where the CTO was convinced of dev environments in kube as "the way to work". Everyone else was at least ambivalent. I joined, tried to make some changes that would let me run things locally. Every time I got pushback about using the dev environments instead.

It took me a couple of days to get a supervisor-based setup working locally. I was the only person on the team who would run the backend and frontend when trying things out, because nobody was actually using the dev environments fully anyways. There was no buy-in for the dev environment!

I really feel like if you are in a position to determine tooling, it's so much more helpful to lean into whatever people on the ground want to use. Obviously there are times when the people on the ground don't care, but if you're spending your sweat and tears to put the square peg into the square hole suddenly you're the person with superpowers, and not the person pushing their pet project.

And sometimes that's just "wrap my thing with your thing".

1 more reply

jrockway1y ago

I think this, or something of equal complexity, is probably the right choice. I have spent a lot of time helping people with their dev environments, and the same problems keep coming up; "no, you need this version of kubectl", "no, you need this version of jq", "no, the Makefile expects THIS version of The Silver Searcher". A mass of shell scripts and random utilities was a consistent drag on the entire team and everyone that interacted with the team.

I ended up going with Bazel, not because of this particular problem alone (though it was part of it; people we hired spent WEEKS trying to get a happy edit/test/debug cycle going), but because proper dependency-based test caching was sorely needed. Using Bazel and Buildbuddy brought CI down from about 17 minutes per run to 3-4 minutes for a typical change, which meant that even if people didn't want to get a local setup going, they could at least be slightly productive. I also made sure that every dependency / tool useful for developing the product was versioned in the repository, so if something needs `psql` you can `bazel run //tools/postgres/psql` and have it just work. (Hate that Postgres can't be statically linked, though.)

It was a lot of work for me, and people do gripe about some things ("I liked `go test ./...`, I can't adjust to `bazel test ...`"), but all in all, it does work well. I would do it again. Day 1 at the company; git clone our thing, install bazelisk, and your environment setup is done. All the tests pass. You can run the app locally with a simple `bazel run`. I'm pretty happy with the outcome.

Nix is something I looked into for our container images, but they just end up being too big. I never figured out why; I think a lot of things are dynamically linked and they include their own /usr/lib tree with the entire transitive dependency chain for that particular app, even if other things you have installed have some overlap with that dependency chain. I prefer the approach of statically linking everything and only including what you need. I compromised by basing things on Debian and rules_distroless, which at least lets you build a container image with the exact same sha256 on two different machines. (We previously just did "FROM scratch; COPY <statically linked binary> /app; ENTRYPOINT /app", but then started needing things like pg_dump in our image. If you can just have a single statically-linked binary be your entire app, great. Sometimes you can't, and then you need some sort of reasonable solution. Also everything ends up growing a dependency on ca-certificates...)

1 more reply

oblio1y ago

I think if you take about 80% of your comment and replace "Nix" with "Haskell/Lisp" and a few other techs, you'd basically have the same thing. Especially point #1.

1 more reply

kstenerud1y ago

After my personal 2-year experiment with NixOS, I'd avoid anything Nix like the plague, and would be looking for a new job if anyone instituted a Nix-only policy.

It's not the learning new things that's a problem, but rather the fact that every little issue turns into a 2-day marathon that's eventually solved with a 1-line fix. And that's because the feedback loop and general UX is just awful - I really started to feel like I needed a sacrificial chicken.

Docker may be a dumpster fire, but at least it's generally easy to see what you did wrong and fix it.

2 more replies

lmm1y ago

I love the idea/design of Nix and hope that one day someone will reimplement it in a way that one can reasonably understand/debug. Language is part of the problem but I think it's more of a https://www.lihaoyi.com/post/SowhatswrongwithSBT.html style problem where the execution model involves too much at runtime.

marcosdumay1y ago

> you either insist on it and say it's the only supported developer-environment-defining framework, or you will lose control over it

That's true for any architectural decision in an organization with more than 1 person.

It's really not something that should make you reconsider a decision. At the end of the day, an architecture that "people" actually want to use doesn't exist, "people" doesn't want any singular thing.

bamboozled1y ago

Try Devbox, you can basically ignore nix entirely and reap all the benefits.

1 more reply

rfoo1y ago

In a worse world, worse is better.

1 more reply

otabdeveloper41y ago

> there is no (to my knowledge) general-purpose, broadly-accepted way to deploy via Nix

`nix copy .#my-crap --to ssh://remote`

What you do with it then on the remote depends on your environment. At the minimum do a `nix-store --add-root` to make a symlink to whatever you just copied.

(The most painless path is if you're deploying an entire NixOS system, but that requires converting the remote host to NixOS first.)

nixdev1y ago

Heh, yeah. You gotta put in writing that only userlands defined in Nix will be eligible to enter any environment beyond "dev". And (also put in writing) that their performance in the role will be partly evaluated on their ability to reach out for help with Nix when they need it.

binary1321y ago

Hello. Currently debugging my kubernetes-based dev pod and not getting anything else done. What fun!

fensgrim1y ago

> how a developer manages assets like source code

IMO there are some workloads, where it is beneficial for a developer to have access to a local repository with at least some snippets based on previous projects.

Having a leftover PoC of some concept written for a previous employer but never elevated to team use/production is both handy (at least to confirm that the build environment is still viable after an unspecified period of toolchain updates) and ethical (copying production code is not ethical - even if the old and new products are vastly different e.g. last job was taxi app, new app is banking app).

Making it all 'remote' and 'cloud' will eventually result in a bike reinvention penalty on each new employment - not everything can be rebuilt from memory only, especially things that are done 1-2 times a year; sure there is open-source documentation/examples, but at some point it'll just introduce even heavier penalty for a need to either know a lot of opensource stuff to have some reference points, or to work on a pet projects to get the same amount of references.

scarface_741y ago

Are you suggesting that you should enable the employee to move work done on company time and that is the company’s IP to a new company?

And the new company would also be liable for using trade secrets that they shouldn’t.

1 more reply

peeters1y ago

> The only real downside is data control (ie - the company has less control over how a developer manages assets like source code).

I've worked in a remote, secured development environment and it sucked, but to their credit the company did it for exactly this reason - control over the source. But bear in mind that source control is a two-way street.

Losing proprietary source can be harmful (especially in compiled languages where the source might carry much more information than the distributable). But they were mostly worried about the opposite way...that something malicious gets INTO the source which could pose an existential threat. You'd be correct to say "well that should be the domain of source control, peer review etc", but in this case the company assessed the risk high enough to do both.

cryptonector1y ago

I've no problem with remote dev envs for most things. But they have to be VMs in many cases, not containers.

20751y ago

I think nowadays the value of source code is rarely a more valuable asset than the data being processed. Also I would prefer to give my devs just a second machine to run workloads and eventually pull in data or mock the data so they get moving more easily.

idunnoman12221y ago

Sounds like you are not using a lot of hardware - Rfid, POS, top-spec video cards, etc

neilv1y ago

> The only real downside is data control (ie - the company has less control over how a developer manages assets like source code). ). I'm my experience, the vast majority of companies should worry less about this [...]

I once had to burn a ton of political capital (including some on credit), because someone who didn't understand software thought that cutting-edge tech startup software developers, even including systems programmers working close to metal, could work effectively using only virtual remote desktops... with a terrible VM configuration... from servers literally halfway around the world... through a very dodgy firewall and VPN... of 10Mb/s total bandwidth... for the entire office of dozens of developers.

(And no other Internet access from the VMs. Administrators would copy whatever files from the Internet that are needed for work. And there was a bureaucratic form for a human process, if you wanted to request any code/data to go in or out. And the laptops/workstations used only as thin-clients for the remote VMs would have to be Windows and run this ridiculous obscure 'endpoint security' software that had changed hands from its ancient developer, and hadn't even updated the marketing materials (e.g., a top bulletpoint was keeping your employees from wasting time on a Web site that famously was wiped out over a decade earlier), and presumably was littered with introduced vulnerabilities and instabilities.)

Note that this was not something like DoD, nor HIPAA, nor finance. Just cutting-edge tech on which (ironically) we wanted first-mover advantage.

This escalated to the other top-titled software engineer and I together doing a presentation to C-suite, on why not only would this kill working productivity (especially in a startup that needed to do creative work fast!), but the bad actors someone was paranoid about could easily circumvent it anyway to exfiltrate data (using methods obvious to the skilled software people like they hired, some undetectable by any security product or even human monitoring they imagined), and all the good rule-following people would quit in incredulous frustration.

Unfortunately, it might not have been even the CEO's call, but a crazy investor.

speedisavirus1y ago

That's fine for some. However it's not always that. I wrote an entire site on my ipad in spare time with GitPods. Maybe you are at a small company with a small team so if things get critical you are likely to get a call. Do you say F'it, do you carry your laptop, or do you carry your ipad like you already are knowing you can still at least do triage if needed because you have a perfectly configured gitpod to use.

reissbaker1y ago

If your app fits on one machine, I agree with you: you absolutely should not use cloud dev environments in my opinion (and I've worked on large dev infra teams, that shipped cloud dev environments). The performance and latency of a Macbook Pro (or Framework 13, or whatever) is going to destroy cloud perf for development purposes.

If it doesn't fit on one machine, though, you don't have another option: Meta, for example, will never have a local dev env for Instagram or Blue. Then you need to make some hard choices.

Personally, my ideal cloud dev env is:

1. Local checkout of the code you're working on. You can use whatever IDE or text editor you prefer. For large monorepos, you'll need some special tooling to make sure it's easy to only check out slices of the repo.

2. Sync the code to the remote execution environment automatically, with hot-reloading.

3. Auto-port-forward from your local machine to the remote.

4. Optionally be able to run dependent services on your personal remote to debug/test their interactions with each other, and optionally be able to connect to a well-maintained shared environment for dependencies you aren't working on. If you have a shared environment, it can't be viewed as less-important than production: if it's broken, it's a SEV and the team that broke it needs to drop everything and fix it immediately. (Otherwise the shared env will be broken all the time, and your shipping speed will either drop, or you'll constantly be shipping bugs to prod due to lack of dev care.)

At Meta we didn't have (1): everyone had to use VSCode, with special in-house plugins that synced to the remote environment. It was okay but honestly a little soul-sucking; I think customizing your tooling is part of a lot of people's craft and helps maintain their flow state. Thankfully we had the rest, so it was tolerable if not enjoyable. At Airbnb we didn't have the political will to enforce (4), so the dev env was always broken. I think (4) is actually the most critical part: it doesn't matter how good the rest of it is, if the org doesn't care about it working.

But yeah — if you don't need it, that's a lot of work and politics. Use local environments as long as you possibly can.

hintymad1y ago

> Personally - just let the developer own the machine they use for development.

It'll work if the company can offer something similar to EC2. Unfortunately most of the companies are not capable of doing so if they are not on cloud.

jt21901y ago

I’m not sure we should leap from:

> I have seen several attempts to move dev environments to a remote host. They invariably suck.

To “therefore they will always suck and have no benefits and nobody should ever use them ever”. Apologies for the hyperbole but I’m making a point that comments like these tend to shut down interesting explorations of the state of the art of remote computing and what the pros/cons are.

Edit: In a world where users demand that companies implement excellent security then we must allow those same companies to limit physical access to their machines as much as possible.

horsawlarway1y ago

But they don't suck because of lack of effort - they suck because there are real physical constraints.

Ex - even on a VERY good connection, RTT on the network is going to exceed your frame latency for a computer sitting in front of you (before we even get into the latency of the actual frame rendering of that remote computer). There's just not a solution for "make the light go faster".

Then we get into the issues the author actually laid out quite compellingly - Shared resources are unpredictable. Is my code running slowly right now because I just introduced an issue, or is it because I'm sharing an env and my neighbor just ate 99% of the CPU/IO, or my network provider has picked a different route and my latency just went up 500ms?

And that's before we even touch the "My machine is down/unreachable, I don't know why and I have no visibility into resolving the issue, when was my last commit again?" style problems...

> Edit: In a world where users demand that companies implement excellent security then we must allow those same companies to limit physical access to their machines as much as possible.

And this... is just bogus. We're not talking about machines running production data. We're talking about a developer environment. Sure - limit access to prod machines all you like, while you're at it, don't give me any production user data either - I sure as hell don't want it for local dev. What I do want is a fast system that I control so that I can actually tweak it as needed to develop and debug the system - it is almost impossible to give a developer "the least access needed" to do development locally because if you know what that access was you wouldn't be developing still.

2 more replies

brunoborges1y ago

> Personally - just let the developer own the machine they use for development.

I wonder if Microsoft's approach for Dev Box is the right one.

codethief1y ago

Could you elaborate on what that approach is?

to11mtm1y ago

laughs in "Here's a VDI with 2vCPUs and 32GB of RAM but the cluster is overloaded, also you get to budget which IDEs you have installed because you have only a couple hundred GB of storage for everything including what we install on the base image that you will never use"

nixdev1y ago

> Personally - just let the developer own the machine they use for development.

Overall I agree with you that this is how it should be, but as DevOps working with so many development teams, I can tell you that too many developers know a language or two but beyond that barely know how to use a computer. Most developers (yes even most of the ones in Silicon Valley or the larger Bay Area) with Macbooks will smile and nod at when you tell them that Docker Desktop runs a virtual machine to run a copy of Linux to run oci images, and then not too much later reveal themselves to have been clueless.

Commenters on this site are generally expected to be in a different category. Just wanted to share that, as a seasoned DevOps pro, I can tell you it's pretty rough out there.

sgarland1y ago

This is an unpopular take, but entirely true. Skilled at a programming language, other than maybe C, does not in any way translate to general skill with system administration, or even knowing how to correctly operate a computer. I once had to explain to a dev that their Mac was out of disk space because a. They had never removed dangling containers or old image versions b. They had never emptied the Trash.

1 more reply

haolez1y ago

Sometimes I don't even use virtual envs when developing locally in Python. I just install everything that I need with pip --user and be done with it. Never had any conflicts with system packages whatsoever. If I somehow break my --user environment, I simply delete it and start again. Never had any major version mismatch in dependencies between my machine and what was running in production. At least not anything that would impact the actual task that I was working on.

I'm not recommending this as a best practice. I just believe that we, as developers, end up creating some myths to ourselves of what works and what doesn't. It's good to re-evaluate these beliefs now and then.

__MatrixMan__1y ago

When doing this re-evaluation, please consider that others might be quietly working very hard to discover and recreate locally whatever secret sauce you and production share.

ok_computer1y ago

The only time I’ve had version issues running python code is that someone prior was referencing a deprecated library API or using an obscure package that shouldn’t see the light of day in a long lived project.

If you stick to the tried and true libs and change your function kwargs or method names when getting warnings, then I’ve had pretty rock steady reproducibility using even an un-versioned “python -m pip install -r requirements.txt” experience

I could also be a slob or just not working at the bleeding edge of python lib deployment tho so take it with a grain of salt.

ctippett1y ago

I'm not going to second-guess what works for you, but Python makes it so easy to work with an ephemeral environment.

  python -m venv .venv

1 more reply

lolinder1y ago

> This is not a story of whether or not to use Kubernetes for production workloads that’s a whole separate conversation. As is the topic of how to build a comprehensive soup-to-nuts developer experience for shipping applications on Kubernetes.

> This is the story of how (not) to build development environments in the cloud.

I'd like to request that the comment thread not turn into a bunch of generic k8s complaints. This is a legitimately interesting article about complicated engineering trade-offs faced by an organization with a very unique workload. Let's talk about that instead of talking about the title!

kitd1y ago

Agreed. It's actually a very interesting use case and I can easily see that K8s wouldn't be the answer. My dev env is very definitely my "pet", thank you very much!

ethbr11y ago

It'd be nice to editorialize the title a bit with "... (for dev envs)" for clarity.

Super useful negative example, and the lengths they pursued to make it fit! And no knock on the initial choice or impressive engineering, as many of the k8s problems they hit likely weren't understood gaps at the time they chose k8s.

Which makes sense, given k8s roots in (a) not being a security isolation tool & (b) targeting up-front configurability over runtime flexibility.

Neither of which mesh well with the co-hosted dev environment use case.

preommr1y ago

Can someone clarify if they mean development environments, or if they're talking about a service that they sell that's related to development environments.

Because I don't understand most of the article if it's the former. How are things like performance are a concern for internal development environments? And why are so many things stateful - ideally there should be some kind of configuration/secret management solution so that deployments are consistent.

If it's the latter, then this is incredibly niche and maybe interesting, but unlikely to be applicable to anyone else.

wutwutwat1y ago

4th paragraph in if you read the article…

> This is the story of how (not) to build development environments in the cloud.

1 more reply

bittermandel1y ago

It's for running their commercial products, which are stateful and long-lived developer environments.

ensignavenger1y ago

The article does a great job of explaining the challenges they ran into with Kubernetes, and some of the things they tried... but I feel like it drops the ball at the end by not telling us at least a little what they chose instead. The article mentions they call their new solution "Gitpod Flex" but there is nothing about what Gitpod Flex is. They said they tried microVMs and decided against them, and of course Kubernetes, the focus of the article. So is GitpodFlex based on full VM's? Docker? Some other container runtime??

Perhaps a followup article will go into detail about their replacement.

loujaybee1y ago

Yeah, that's fair. The blog was getting quite long, so we need to do some deeper dives in follow-ups.

Gitpod Flex is runner-based. The runner interface is intentionally generic so that we can support different clouds, on-prem or just Linux in future.

The first implemented runner is built around AWS primitives like EC2, EBS and ECS. But because of the more generic interface Gitpod now supports local / desktop environments on MacOS. And again, future OS support will come.

There’s a bit more information in the docs, but we will do some follow ups!

- https://www.gitpod.io/docs/flex/runners/aws/setup-aws-runner... - https://www.gitpod.io/docs/flex/gitpod-desktop

(I work at Gitpod)

nickstinemates1y ago

Echoing the parent you're replying to. You built up all of the context and missed they payoff.

1 more reply

Bombthecat1y ago

Still No idea what you did technically... Maybe a second post?

Did you use consul?

1 more reply

ensignavenger1y ago

Awesome, looking forward to hearing more. I only recently began testing out Theia and OpenVSCodeServer, I really appreciate Gitpod's contributions to open source!

weikju1y ago

What’s a “runner”?

1 more reply

concerndc1tizen1y ago

Sounds more to me like they need a new CTO.

And that they're desperate to tell customers that they've fixed their problems.

Kubernetes is absolutely the wrong tool for this use case, and I argue that this should be obvious to someone in a CTO-level position, or their immediate advisors.

Kubernetes excels as a microservices platform, running reasonably trustworthy workloads. The key features of Kubernetes are rollout (highly available upgrades), elasticity (horizontal scaleout), bin packing (resource limits), CSI (dynamically mounted block storage), and so on. All this relates to a highly dynamic environment.

This is not at all what Gitpod needs. They need high performance disks, ballooning memory, live migrations, and isolated workloads.

Kubernetes does not provide you sufficient security boundaries for untrusted workloads. You need virtualization for that, and ideally physically separate machines.

Another major mistake they made was trying to build this on public cloud infrastructure. Of course the performance will be ridiculous.

However, one major reason for using Kubernetes is sharing the GPU. That is, to my knowledge, not possible with virtualization. But again, do you want to risk sharing your data, on a shared GPU?

bittermandel1y ago

I consider Kubernetes to be an excellent framework to build these kinds of applications. The difference here is Gitpod being stateful, which is notoriously hard on Kubernetes, though easier now than ever before!

To clarify on one of your points, Kubernetes itself has nothing to do with actually setting the security boundaries. It only providers a schema to describe resources and policies, and then an underlying system (perhaps Cilium for networking, or Kata Containers for micro VMs) can ensure that the resources created actually follow those schemas and policies.

For example, Neon have built https://github.com/neondatabase/autoscaling which manages Neon Instances with Kubernetes by running them with QEMU instead. This allows them to do live migrations and resource (de)allocation while the service is running, without having to replace Kubernetes. These workloads are, as far as I understand it, stateless.

santiagobasulto1y ago

> The difference here is Gitpod being stateful, which is notoriously hard on Kubernetes, though easier now than ever before!

We've always had issues with stateful kubernetes setups. Can you share what makes it easier today than before? Genuinely interested.

concerndc1tizen1y ago

You make an excellent point, and it emphasizes the need to distinguish between a typical Kubernetes setup (containers, pod/service mesh, and so on), and what Kubernetes can do in the abstract. In the extreme, the API server is just an HTTP interface for a KV store with a bit of RBAC and validation-mutation extensions.

What Neon is doing is quite a feat: Live migration (of a VM) while preserving TCP connections. It also took a lot of customization to achieve that.

But I agree that Kubernetes can indeed be used this way.

If anything, it further cements my original point about the Gitpod leadership.

The problem was never Kubernetes, but the dimwitted notion of using containers.

And then blaming Kubernetes for it: We're leaving you.

dilyevsky1y ago

I agree on the cloud thing. Don't agree that "high performance disks, ballooning memory, live migrations, and isolated workloads" preclude from using k8s - you can still run it as base layer. You get some central configuration storage, machine management and some other niceties for free and you can push your VM-specific features into your application pod. In fact, that's how Google Cloud is designed (except they use Borg not k8s but same idea).

concerndc1tizen1y ago

True! I love the idea of using K8s to orchestrate the running of VMs. With graceful shutdown and distributed storage, it makes it even more trivial to semi-live migrate VMs.

Are you aware of the limits? It must run as root and privileged?

1 more reply

ed_mercer1y ago

Why would you say that performance is bad on public cloud infrastructure?

concerndc1tizen1y ago

There are things that public cloud is great for. Cost efficiency at high performance is not it. For Gitpod, performance is critical to their product offering, because any latency in a dev environment is terrible UX.

Example: What performance do you get out of your NVMe disks? Because these days you can build storage that delivers 100-200 GB/s.

https://www.graidtech.com/wp-content/uploads/2023/04/Results...

I bet few public cloud customers are seeing that kind of performance.

1 more reply

rahen1y ago

Kubernetes works great for stateless workloads.

For anything stateful, monolithic, or that doesn't require autoscaling, I find LXC more appropriate:

- it can be clusterized (LXD/Incus), like K8S but unlike Compose

- it exposes some tooling to the data plane, especially a load balancer, like K8S

- it offers system instances with a complete distribution and a init system, like a VM but unlike a Docker container

- it can orchestrate both VMs (including Windows VMs) and LXC containers at the same time in the same cluster

- LXC containers have the same performance as Docker containers unlike a VM

- it uses a declarative syntax

- it can be used as a foundation layer for anything stateful or stateless, including the Kubernetes cluster

LXD/Incus sits somewhere between Docker Swarm and a vCenter cluster, which makes it one of the most versatile platform. Nomad is also a nice contender, it cannot orchestrate LXC containers but can autoscale a variety of workloads, including Java apps and qemu VMs.

belthesar1y ago

I too am rallying quickly to the Incus way of doing things. Also of note, there's an effort to build a utility to write Compose manifests for Incus workloads that I'm following very closely. https://github.com/bketelsen/incus-compose

eddyg1y ago

Thanks for pointing out `incus-compose`!

OneCricketeer1y ago

Nomad cannot orchestrate what, exactly? https://developer.hashicorp.com/nomad/tutorials/plugins/plug...

xyst1y ago

I do agree with the points in article that k8s is not a good fit for development environments.

In my opinion, k8s is great for stable and consistent deployment/orchestration of applications. Dev environments by default are in a constant state of flux.

I don’t understand the need for “cloud development environments” though. Isn’t the point of containerized apps is to avoid the need for synchronizing dev envs amongst teams?

Or maybe this product is supposed to decrease onboarding friction?

sofixa1y ago

It's to ensure a consistent environment for all developers, with the resources required. E.g. they mention GPUs, for developers working with GPU-intensive workloads. You can ship all developers gaming laptops with 64GB RAM and proper GPUs, and have them fight the environment to get the correct libraries as you have in prod (even with containers that's not trivial), or you can ship them Macbook Airs and similar, and have them run consistent (the same) dev environments remotely (you can self-host gitpod, it's not only a cloud service, it's more the API/environment to get consistent remote dev enviornments).

loujaybee1y ago

Yeah, exactly. Containers locally are a basic foundation. But usually those containers or services need to talk to one another, they need some form of auth and credentials, they need some networking setup. There's a lot of configuration in all of that. The more devs swap projects or the more complex the thing you're working on the more the challenge grows. Automating depedencies, secret access, ensuring projects have the right memory, cpu, gpu etc. Also security - moving source code off your laptop and devices and standardizing your setups helps if you need to do a lot of audit and compliance as you can automate it.

roshbhatia1y ago

In my experience, the case where this becomes really valuable is if your team needs access to either different kinds of hardware or really expensive hardware that changes relatively quickly (i.e. GPUs). At a previous small startup I setup https://devpod.sh/ (similar to gitpod) for our MLE/Data team. It was a big pro to leverage our existing k8s setup w/ little configuration needed to get these developer envs up and running as-needed, and we could piggyback off of our existing cost tracking tooling to measure usage, but I do feel like we already had infra conducive to running dev envs on k8s before making this decision -- we had cost tracking tooling, we had a dedicated k8s cluster for tooling, we had already been supporting GPU based workloads in k8s, and our platform team that managed all the k8s infra also were the SMEs for anything devenv releated. In a world where we started fresh and absolutely needed ephemeral devenvs, I think the native devcontainer functionality in vscode or something like github codespaces would have been our go to, but even then I'd push for a docker-compose based workflow prior to touching any of these other tools.

The rest of our eng team just did dev on their laptops though. I do think there was a level of batteries-included-ness that came with the ephemeral dev envs which our less technical data scientists appreciated, but the rest of our developers did not. Just my 2c

dikei1y ago

Sarcastically, CDE is one way to move cost from CAPEX (get your developer a Mac Book Pro) to OPEX (a monthly subscription that you only need to pay as long as the dev has not been lay off)

It's also much cheaper to hire contractors and give them the CDE that can be terminated on a moment notice.

gloosx1y ago

  >Kubernetes seems like the obvious choice for building out remote, standardized and automated development environments

- Is it really Obvious Choice™ though Fred?

- Hmm, let's consult the graphs.

  >Kubernetes is a container orchestration system for automating software deployment.

- It's about automating deployment Carl, not development environments!

  >Kubernetes is not the right choice for building development environments, as we’ve found.

datadeft1y ago

The original k8s paper mentioned that the only use case was a low latency and a high latency workflow combination and the resource allocation is based on that. The generic idea is that you can easily move low latency work between nodes and there are no serios repercussions when a high latency job fails.

Based on this information, it is hard to justify to even consider k8s for the problem that gitpod has.

junkaccount1y ago

Thanks for reading the paper!

datadeft1y ago

For those who are interested:

https://static.googleusercontent.com/media/research.google.c...

I am not sure what differences k8s has compare to Borg. At the concept level these are pretty comparable.

geoctl1y ago

I've worked on something similar to gitpod in a slightly different context that's part of a much bigger personal project related to secure remote access that I've actually spent a few years building now and hope to open source in a few months from now. While I agree on many of the points in the article, I just don't understand how using micro VMs by itself replaces K8s unless they actually start building their own K8s that orchestrates their micro VMs (as opposed to containers in the case of k8s) ending up with the same thing basically when k8s itself can be used to orchestrate the outer containers that run the micro VMs used to run the dev containers. Yes, k8s has many challenges when it comes to nesting containers, cgroups, creating rootless containers inside the outer k8s containers and other stuff such as multi-region scaling, but actually the biggest challenge that I've faced so far isn't related to networkPolicies or cgroups but is actually by far related to storage, both when it comes to (lazily) pulling big OCI images which are extremely unready to be used for dev containers whose sizes are typically in the GBs or 10s of GBs as well as also when it comes to storage virtualization over the underlying k8s node storage. There are serious attempts to accelerate image pulling (e.g. Nydus) but such solutions would still probably be needed whether you use micro VMs or rootless/userns containers in order to load and run your dev containers.

abofh1y ago

I feel like anyone who was building a CI solution to sell to others and chose kubernetes didn't really understand the problem.

You're running hot pods for crypto miners and against people who really want to see the rest of the code that box has ever seen. You should be isolating with something purpose built like firecracker, and do your own dispatch & shred for security.

geoctl1y ago

Firecracker is more comparable to container runtimes than to orchestrators such as K8s. You still need an orchestrator to schedule, manage and garbage-collect all your uVMs on top of your infrastructure exactly like you would do with containers via k8s. In other words, you will probably have to either use k8s or build your own k8s to run "supervisor" containers/processes that launch uVMs which in turn launch the customer dev containers.

abofh1y ago

For sure, but that's the point - containers aren't really good for an adversarial CI solution. You can run that shit in house on kubernetes on a VM in a simulated VR if you want. But if you have adversarial builds, you have a) builds that may well need close to root, and b) customers who may well want to break your shit. Containers are not the right solution for that, VM's get you mostly there, and the right answer is burning bare metal instances with fire after every change-of-tenant - but nobody does that (anymore), because VM's are close enough and it's faster to zero out a virtual disk than a real one.

So if you started with kubernetes and fought the whole process of why it's not a great solution to the problem, I have to assume you didn't understand the problem. I :heart: kubernetes, its complexity pays my bills - but it's barely a good CI solution when you trust everyone involved, it's definitely not a good one where you're trying to be general-purpose to everyone with a makefile.

1 more reply

merb1y ago

you can run your pods in vms, with something like kata containers. Kubernetes is more a scheduler than a isolation layer. Of course it uses the cri-o runtime for containers by default and relies heavily on groups, but that is just the default

clvx1y ago

I tried doing a dev environment on Kubernetes but the fact you have to be dealing with a set of containers that could change if the base layer changed meant instability in certain cases which threw me off.

I ended up with a mix of nix and it's vm build system which is based on qemu. The issue is too tied to NixOS and all services run in the same place which forces you to manage ports and other things.

How I wish it could work is having a flake that defines certain services, these services could or could not run in different µVMs sharing an isolated linux network layer. Your flake could define your versions, your commands to interact and manage the lifecyle of those µVM's. As the nix store can be cached/shared, it can be provide fast and reproducible builds after the first build.

eptcyka1y ago

Have you tried https://github.com/astro/microvm.nix ? You can use the same NixOS module for both declarative VMs and imperatively configured and spawned VMs.

candiddevmike1y ago

> the fact you have to be dealing with a set of containers that could change if the base layer changed meant instability

Can you expand on this? Are you talking about containers you create?

rekoros1y ago

We've been using Nix flakes and direnv (https://direnv.net/) for developer environments and NixOS with https://github.com/serokell/deploy-rs for prod/deploys - takes serious digging and time to set up, but excellent experience with it so far.

andreweggleston1y ago

I’ve been using Nix for the past year and it really feels like the holy grail for stable development environments. Like you said—it takes serious time to set up, but it seems like that’s an unavoidable reality of easily sharable dev envs.

aliasxneo1y ago

Serious time to set up _and_ maintain as the project changes. At least, that was my experience. I really _want_ to have Nix-powered development environments, but I do _not_ want to spend the rest of my career maintaining them because developers refuse to "seriously dig" to understand how it works and why it decided to randomly break when they added a new dependency.

I think this approach works best in small teams where everyone agrees to drink the Nix juice. Otherwise, it's caused nothing but strife in my company.

rekoros1y ago

This may be the one area where some form of autocracy has merit :-)

1 more reply

debarshri1y ago

Phew, it is absolutely true. Building dev environments on k8s become wasteful. To add to this complexity, if you are building a product that is self hosted on customer's infrastructure. Debugging and support also become non homogeneous and difficult.

What we have seen works especially when you are building developer centric product is expose these native issues around network, memory, compute and storage to engineers and they are more willing to work around it. Abstracting those issues leads to shift in responsibility on the product.

Having said that, I still think k8s is an upgrade when you have a large team.

eichi1y ago

Kubernetes is just combined infra admin practices. Whether we use it or not, we need to do the same things by local oriented way or vendor specific way .

1. Some operations on remote in local oriented way are time consuming and unmanageable.

2. With vendor specific way, our skill would be deprecated, having dependency to the vendors.

3. Kubernetes is not the best tools but it it popular.

As always, custom solution is the most powerful but should be replaced with more unified way for the stability of the development.

alecfong1y ago

Our first implementation of brev.dev was built on top of kubernetes. We were also building a remote dev environment tool at the time. Treating dev environments like cattle seemed to be the wrong assumption. Turning kubernetes into a pet manager was a huge endeavor with long tail of issues. We rewrote our platform against vms and were immediately able to provide a better experience. Lots of tradeoffs but makes sense for dev envs.

pphysch1y ago

The problem with "development environments", like other interactive workloads, is that there is a human at the other end that desires a good interactive experience with every keypress. It's a radically different problem space than what k8s was designed for.

From a resource provider productive, the only way to squeeze a margin out of that space would be to reverse engineer 100% of human developer behavior so that you can ~perfectly predict "slack" in the system that could be reallocated to other users. Otherwise it's just a worse DX, like TFA gives examples of. Not a business I'm envious too be in... Just give everyone a dedicated VM or desktop, and make sure there's a batch system for big workloads.

cryptica1y ago

Kubernetes is awesome but I understand what the article is getting at. K8s was designed for a mostly homogeneous architecture when your platform requirements end with "deploy this service to my cluster" and you don't really care about the specifics of how it's scheduled.

A heterogeneous architecture with multi-tenancy poses some unique challenges because, as mentioned in the article, you get highly inconsistent usage patterns across different services. Also, arbitrary code execution (with sandboxing) can present a signifiant challenge. For security, you ideally need full isolation between services which belong to different users; this isolation wasn't a primary design goal of Kubernetes.

That said, you can probably still use K8s, but in a different way. For smaller customers, you could co-locate on the same cluster, but for larger customers which have high scalability requirements, you could have a separate K8s cluster for each one. Surely for such customers, it's worth the extra effort.

So in conclusion, I don't think the problems which were identified necessarily warrant abandoning K8s entirely, but maybe just a rethinking of how K8s is used. K8s still provides a lot of value in treating a whole cluster of computers as a single machine, especially if all your architecture is already set up for it. In addition to scheduling/orchestration, K8s offers a lot of very nice-to-have features like performance monitoring, dashboards, aggregated logs, ingress, health checks, ...

junkaccount1y ago

The real reason for this shift is that kubernetes moved to containerd which they cannot handle. Docker was much easier. Differential workloads is not correct to blame.

Also, there is a long tail of issues to be fixed if you do it with Kubernetes.

Kubernetes does not just give you scaling, it gives you many things: run on any architecture, be close to your deployment etc.

moondev1y ago

https://github.com/Mirantis/cri-dockerd

junkaccount1y ago

Most of the kubernetes providers (GKE, EKS) do not support this new shim. Even on baremetal it is possibly hard to run.

linuxftw1y ago

The article offers toward the end that now self-hosted customers can run their app on something other than k8s. I think this is a mistake. We're a k8s enterprise shop, and I don't want to support any more VMs. If it's not on k8s, I'm not running it. I don't want to be responsible for golden images, patching, and all the fun that comes with managing workloads outside of k8s. That's why I have k8s.

All the problems in the article also seem self-imposed. k8s can run stateful workloads just fine. Don't start and stop them. Figure out the math on how much it costs to run a container 24/7, add your margin, and pass that cost to the customer. Customer can decide to stop the containers to save $$, so the latency won't hurt, they'll accept it because they know they're saving money.

lmeyerov1y ago

I was intrigued because the development environment problem is similar to the data scientist one - data gravity, GPU sharing, etc - but I'm confused on the solution?

Oddly, I left with a funny alternate takeaway: One by one, their clever inhouse tweaks & scheduling preferences were recognized by the community and turned into standard k8s knobs

So I'm back to the original question... What is fundamentally left? It sounds like one part is maintaining a clean container path to simplify a local deploy, which a lot of k8s teams do (ex: most of our enterprise customers prefer our docker compose & AMIs over k8s). But more importantly, something fundamental architecturally about how envs run that k8s cannot do, but they do not identify?

csweichel1y ago

OP here. The Kubernetes community has been fantastic at evolving the platform, and we've greatly enjoyed being in the middle of it. Indeed, many of the things we had to build next to Kubernetes have now become part of k8s itself.

Still, some of the core challenges remain: - the flexibility Kubernetes affords makes it hard to build and distribute a product with such specific requirements across the broad swath of differently set up Kubernetes installations. Managed Kubernetes services help, but come with their own restrictions (e.g. Kernel versions on GKE). - state handling and storage remains unsolved. PVCs are not reliable enough, subject to a lot of variance (see point above), and depending on the backing storage have vastly different behaviour. Local disks (which we use to this day), make workspace startup and backup expensive from a resource perspective and hard to predict timing wise. - user namespaces have come a long way in Kubernetes, but by themselves are not enough. /proc is still masked, FUSE is still not usable. - startup times, specifically container pulls and backup restoration, are hard to optimize because they depend on a lot of factors outside of our control (image homogeneity, cluster configuration)

Fundamentally, Kubernetes simply isn't the right choice here. It's possible to make it work, but at some point the ROI of running on Kubernetes simply isn't there.

lmeyerov1y ago

Thanks!

AFAICT, a lot of that comes down to storage abstractions, which I'll be curious to see the answer on! Pinned localstorage <> cloud native is frustrating.

I sense another big chunk is the fast secure start problems that firecracker (noted in the blogpost) solve but k8s is not currently equipped for. Our team has been puzzling that one for awhile, and part of our guess is incentives. It's been 5+ years since firecracker came out, so likewise been frustrating to see.

thenaturalist1y ago

> We’ll be posting a lot more about Gitpod Flex architecture in the coming weeks or months. I’d love to invite you on November the 6th to a virtual event where I’ll be giving a demo of Gitpod Flex and I’ll deep-dive into the architecture and security model at length.

Bottom of the post.

vrnvu1y ago

For development, I made the switch to nix/flox and it’s been a game-changer.

mstrangfeld1y ago

How well does Flox work out of the box? I would really like to introduce Nix to the dev environments in my company but the struggle of maintaining nix files and flakes is too large. I've looked at DevBox and it looks quite accessible but Flox also looks like a nice way to sneak some of the Nix goodness into the company.

vrnvu1y ago

It works pretty well for most common packages, if a rare package/dependency fails is mostly nix problem to solve upstream.

4WIW1y ago

Regardless how you edit/compile your code, you still need to debug/troubleshoot problems in production, and that is very likely to use Kubernetes. So the more reasonable approach seems to be: first, figure out how do you troubleshoot/identify/mitigate a problem in production, then reproduce it in development environment and work to fix it at daytime. When you have instrumented your app for reasonable debugging experience then using these tools on development machine becomes much easier problem, K8s or not.

javier_e061y ago

The article is an excellent cautionary tale. Debugging an app in a container is one thing. Debugging and app running inside a Kubernetes node is a rabbit hole that demands more hours and expertise.

bluelightning2k1y ago

The debate in the comments about whether you should run locally is fascinating.

To the people saying ultra modern hardware could handle it: worth remembering the companies on question started on this path X years ago with Y set of technologies and Z set of experiences.

Because it made sense for Google in 2012 or whatever doesn't necessarily mean they would choose it again --or not-- given a do over (but there's basically no way back).

Jack0081y ago

Make sure you need microservices-based architecture because it comes with its own complexity - a load balancer, container networking, distributed tracing, etc. If you application does not need to scale its sub-components independently, you are better off using a VM-based application. It's 10X cheaper to maintain/troubleshoot and is high performance/resources.

hintymad1y ago

I was wondering if there's productivity angle too. Take Ceph vs Rook for example. If a Ceph cluster needs all the resources on its machines and the cluster manages its resources too, then moving to Rook does not give any additional features. All the 50K additional lines of code in Rook is to set up CSIs and statefulsets and whatnot just to get Ceph working on Kubernetes.

cheptsov1y ago

I can completely relate to anyone abandoning K8s. I'm working with dstack, an open-source alternative to K8s for AI infra [1]. We talk to many people who are frustrated with K8s, especially for GPU and AI workloads.

[1] https://github.com/dstackai/dstack

Muhtasham1y ago

I really like dstack, keep up the great work

deepsun1y ago

> SSD RAID 0

> A simpler version of this setup is to use a single SSD attached to the node. This approach provides lower IOPS and bandwidth, and still binds the data to individual nodes.

Are you sure SSD is that slow? NVMe devices are so fast that I hardly believe there's any need for RAID 0.

mikeshi421y ago

In AWS iirc NVMe max out at 2GB/s - I'm not sure why that's the case. I know there were issues with the PCIe controller in the past being the bottleneck, but I suspect there's something more to it than that.

dwroberts1y ago

> Autoscaler plugins: In June 2022, we switched to using cluster-autoscaler plugins when they were introduced.

Does anyone have any links for cluster-autoscaler plugins? Searching drawing a blank, even in the cluster-autoscaler repo itself. Did this concept get ditched/removed?

riiii1y ago

> development environments

Kubernetes has never ever struck me as a good idea for a development environment. I'm surprised it took the author this long to figure out.

K8s can be a lifesaver for production, staging, testing, ... depending on your requirements and infrastructure.

sethammons1y ago

Our operations team is planning to build dev envs in k8s, but only the networked dependencies. Like a personal testing/staging where you have full control of the thing(s) you are developing and can simply leverage the rest of the stack.

Sounds sane. Am i missing anything?

mrbluecoat1y ago

> Kubernetes is immensely challenging as a development environment platform

Glad someone said it out loud. So true. Apptainer has been a far better development experience for us.

bhouston1y ago

I also recently left Kubernetes. It was a huge waste of time and money. I've replaced it with just a series of services on Google Cloud Run and then using Google's Cloud Run Tasks services for longer running tasks.

The infrastructure now incredibly understandable and simple and cost effective.

Kubernetes cost us >$million in both DevOps time and actually Google Cloud costs unnecessarily, and even worse it cost us time to market. Stay off of Kubernetes as long as you can in your company, unless you are basically forced onto it. You should view it as an unnecessary evil that comes with massive downsides in terms of complexity and cost.

kbolino1y ago

As far as I can tell, there actually is no AWS equivalent to GCP Cloud Run. The closest equivalents I know of are ECS on Fargate, which is more like managed Kubernetes except without Kubernetes compatibility or modern features, or AppRunner, which is closer in concept but also sorely lacking in comparable features.

Imustaskforhelp1y ago

wow very very interesting. I think we can discuss about it on hours.

1.) What would you think of things like hetzner / linode / digitalocean (if stable work exists)

2.) What do you think of https://sst.dev/ or https://encore.dev/ ? (They support rather easier migration)

3.) Could you please indicate the split of that 1 million $ in devops time and google cloud costs unnecessarily & were there some outliers (like oh our intern didn't add this specific variable and this misconfigured cloud and wasted 10k on gcloud oops! or was it , that bandwidth causes this much more in gcloud (I don't think latter to be the case though))

Looking forward to chatting with you!

elcomet1y ago

Aren't you afraid of being now stuck with GCP?

bhouston1y ago

It is just a bunch of docker containers. Some run in tasks and some run as auto-scaling services. Would probably take a week to switch to AWS as there are equivalent managed services there.

But this is really a spurious concern. I myself used to care about it years ago. But in practice, rarely do people switch between cloud providers because the incremental benefits are minor, they are nearly equivalent, there is nothing much to be gained by moving from one to the other unless politics are involved (e.g. someone high up wants a specific provider.)

1 more reply

sofixa1y ago

One of Cloud Run's main advantages is that it's literally just telling it how to run containers. You could run those same containers in OpenFaaS, Lambda, etc relatively easily.

rglover1y ago

What stack are you deploying?

bhouston1y ago

Stuff like this, just at larger scale:

https://github.com/bhouston/template-typescript-monorepo

This is my living template of best practices.

1 more reply

candiddevmike1y ago

You know that Cloud Run is effectively a Kubernetes PaaS, right?

richards1y ago

Google employee here. Not the case. Cloud Run doesn't run on Kubernetes. It supports the Knative interface which is an OSS project for Kubernetes-based serverless. But Cloud Run is a fully managed service that sits directly atop Borg (https://cloud.google.com/run/docs/securing/security).

1 more reply

chanux1y ago

I guess the point is that for the OP, Kubernetes is now someone else's problem.

bhouston1y ago

> You know that Cloud Run is a Kubernetes PaaS, right?

Yup. Isn't it Knative Serving or a home grown Google alternative to it? https://knative.dev/docs/serving/

The key is I am not managing Kubernetes and I am not paying for it - it is a fool's errand, and incredibly rarely needed. Who cares what is underneath the simple Cloud Run developer UX? What matters for me is cost, simplicity, speed and understandability. You get that with Cloud Run, and you don't with Kubernetes.

1 more reply

eYrKEC21y ago

Have folks seen success with https://earthly.dev/ as a tool in their dev cycle?

tacone1y ago

On a side note: has anybody experience with MicroK8s? I'd love to learn stories about it. I'm interested in both dev and production experiences.

redrove1y ago

Microk8s is nothing but a kubernetes distro from canonical. Personally I would use k3s because it’s a little more widespread and less opinionated in a good way.

Anyway, as always it depends on what you want to use it for.

eadem1y ago

The cloud maker is the answer of all this. qbo.io

remram1y ago

Hi Alex Diaz from qbo, can you stop spamming links to your website all over the net? At least elaborate.

ncrmro1y ago

We started having a few developers have constant VSCode timeouts. We switched to GitHub devcontainers which have been great.

remram1y ago

Damn those are really good features they could have contributed to Kubernetes.

vbezhenar1y ago

I read this article and I still don't understand what's wrong with Kubernetes for this task. Everything you would do with virtual machines could be done with Kubernetes with very similar results.

I guess team just wants to rewrite everything, it happens. Manager should prevent that.

eadem1y ago

The cloud Maker is the answer to all this. qbo.io

rohitghumare1y ago

You just simplified Kubernetes Management System

myestery1y ago

Leaving this comment here so I'll always come back to read this as someone who was considering kubernetes for a platform like gitpod

teach1y ago

Remember that you can favorite posts.

mleonhard1y ago

Why don't they describe their new system? I feel disappointed. :(

j / k navigate · click thread line to collapse

333 comments

horsawlarway1y ago

Personally - just let the developer own the machine they use for development.

I have seen several attempts to move dev environments to a remote host. They invariably suck.

Yes - that means you need to pay for decent hardware for your devs, it's usually cheaper than remote resources (for a lot of reasons).

Yes - that means you need to support running your stack locally. This is a good constraint (and a place where containers are your friend for consistency).

Yes - that means you need data generation tooling to populate a local env. This can be automated relatively well, and it's something you need with a remote env anyways.

---

If you're in the 0.5% of other cases... you know it and you should be in an air-gapped closed room anyways (and I've worked in those too...)

shriek1y ago

ethbr11y ago

Aka 'if a developer knew beforehand everything they needed, it wouldn't be development'

marcosdumay1y ago

That's the least important problem.

Anyway, yes, working locally on false data that little resemblance to production still beats remote environments.

czhu121y ago

We tried this approach at a former company with ~600 engineers at the time.

I guess one can grumble about bad architecture all day but this had to be solved. we had to move to remote development environments which restored everyone’s sanity.

Both FAANG companies I’ve worked at had remote dev environments that were built in house.

chipdart1y ago

This is certainly one of the critical mistakes you did.

No developer needs to launch half of the company's services to work on a local deployment. That's crazy, and awfully short-sighted.

> I guess one can grumble about bad architecture all day but this had to be solved.

> Both FAANG companies I’ve worked at had remote dev environments that were built in house.

3 more replies

rileymat21y ago

3 more replies

wlll1y ago

1 more reply

justinclift1y ago

> installing ~50ish microservices on their machine

Ouch. Where they using macOS at the time with laptops having not-enough-ram?

I've seen that go poorly on macOS with java based microservices. Largely due to java VMs wanting ram pre-assigned for each, which really chews though ram that mostly sits around unused.

This was a few years ago though, at the tail end of Intel based mac's where 32GB ram in a mac laptop wasn't really an option.

3 more replies

gchamonlive1y ago

You need an option to enable only the services they need to develop locally and automatically configure them to talk to the other services hosted elsewhere.

1 more reply

jen201y ago

> Both FAANG companies I’ve worked at had remote dev environments that were built in house.

This is certainly not universal among FAANGs though.

Requiring 50 services to be up is absolutely nuts, but it’s actually pretty trivial using something like Nomad locally.

eloisant1y ago

In a past job I've had a good experience in this case with docker compose (well, something similar).

1 more reply

0xbadcafebee1y ago

hosh1y ago

2 more replies

lmm1y ago

> Most teams/products I have been involved in, the stack always grows to the point that a dev can no longer test it on their own machine, regardless of how big the machine is.

1 more reply

kgeist1y ago

1 more reply

arp2421y ago

> the stack always grows to the point that a dev can no longer test it on their own machine

So the solution here is to not have that kind of "stack".

There are some exceptions where you really need a big beefy machine for a dev env and tests, maybe, but they're few and far between.

2 more replies

stephenr1y ago

> the stack always grows to the point that a dev can no longer test it on their own machine

Sounds like you have a different problem.

CPU resources required to run your stack should be very minimal if it's a single user accessing it for local testing idle threads don't consume oodles of cpu cycles to do nothing.

In 95% of cases if you can't run the stack for a single user testing it, on a single physical machine, you're doing something drastically wrong somewhere.

maxrecursion1y ago

"Most teams/products I have been involved in, the stack always grows to the point that a dev can no longer test it on their own machine"

Isn't this problem solved by CICD? When the developer is ready to test, they make a commit, and the pipeline deploys the code to a dev/test environment. That's how my teams have been doing it.

2 more replies

pylua1y ago

How tightly coupled are these systems?

rossjudson1y ago

The stack(factory) must grow.

lotharcable1y ago

I strongly recommend just switching the Dev environment over to Linux and taking advantage of tools like "distrobox" and "toolbx".

https://github.com/89luca89/distrobox

https://containertoolbx.org/

https://podman.io/

Also it is usually pretty easy to convert your normal docker or kubernetes containers over to something you can run on your desktop.

If there is a corporate need for window-specific applications then running Windows VMs or doing remote applications over RDP is a possible work around.

trog1y ago

You'll run into occasional issues (e.g. if everyone is trying to run default node.js on default port) but with some basic guardrails it feels like it should be OK?

1 more reply

h4ck_th3_pl4n3t1y ago

This.

Distrobox and podman are such a charm to use, and so easily integrated into dev environments and production environments.

The intentional daemon free concept is so much easier to setup in practice, as there's no fiddly group management necessary anymore.

Just a 5 line systemd service file and that's it. Easy as pie.

csweichel1y ago

OP here. There definitely is a place for running things on your local machine. Exactly as you say: one can get a great deal of consistency using VMs.

michaelt1y ago

> The behaviour and qualities of those environments can differ vastly though (network bandwidth, latency, GPU, RAM, CPUs, ARM/x86).

For example, when you're running on your local machine you've actually got the amount of RAM and CPU advertised :)

1 more reply

justinclift1y ago

Something that's not clear from the post is whether you're running these environments on your own hardware, or layering things on top of something from a cloud provider (AWS, etc)?

master_crab1y ago

Hi Christian. We just deployed Gitpod EKS at our company in NY. Can we get some details on the replacement architecture? I’m sure it’s great but the devil is always in the details.

edplmajor1y ago

Need middleware libs that react to eBPF data and signal app code to scale up/down forks in their own memory VM, like V8

Kubernetes is another mess of userspace ops tools. Userspace is for composable UI not backend. Kube and Chef and all those other ops tools are backend functionality being used like UI by leet haxxors

pmarreck1y ago

Unfortunately, after a few hires (hand-picked by me), this is what happened:

3) At a startup, execution is everything and time is possibly too short (especially if you have kids) to learn new things that aren't simple, even if better... that unfortunately may include Nix.

sgarland1y ago

3 more replies

lemme_tell_ya1y ago

> time is possibly too short (especially if you have kids) to learn new things that aren't simple, even if better

Having a kid has drastically altered my ability to learn new things outside of work, simply due to lack of time. I never could have imagined how big of an impact having a kid would be, its crazy!

1 more reply

kalaksi1y ago

3 more replies

rtpg1y ago

And sometimes that's just "wrap my thing with your thing".

1 more reply

jrockway1y ago

1 more reply

oblio1y ago

I think if you take about 80% of your comment and replace "Nix" with "Haskell/Lisp" and a few other techs, you'd basically have the same thing. Especially point #1.

1 more reply

kstenerud1y ago

After my personal 2-year experiment with NixOS, I'd avoid anything Nix like the plague, and would be looking for a new job if anyone instituted a Nix-only policy.

Docker may be a dumpster fire, but at least it's generally easy to see what you did wrong and fix it.

2 more replies

lmm1y ago

marcosdumay1y ago

> you either insist on it and say it's the only supported developer-environment-defining framework, or you will lose control over it

That's true for any architectural decision in an organization with more than 1 person.

bamboozled1y ago

Try Devbox, you can basically ignore nix entirely and reap all the benefits.

1 more reply

rfoo1y ago

In a worse world, worse is better.

1 more reply

otabdeveloper41y ago

> there is no (to my knowledge) general-purpose, broadly-accepted way to deploy via Nix

`nix copy .#my-crap --to ssh://remote`

What you do with it then on the remote depends on your environment. At the minimum do a `nix-store --add-root` to make a symlink to whatever you just copied.

(The most painless path is if you're deploying an entire NixOS system, but that requires converting the remote host to NixOS first.)

nixdev1y ago

binary1321y ago

Hello. Currently debugging my kubernetes-based dev pod and not getting anything else done. What fun!

fensgrim1y ago

> how a developer manages assets like source code

IMO there are some workloads, where it is beneficial for a developer to have access to a local repository with at least some snippets based on previous projects.

scarface_741y ago

Are you suggesting that you should enable the employee to move work done on company time and that is the company’s IP to a new company?

And the new company would also be liable for using trade secrets that they shouldn’t.

1 more reply

peeters1y ago

> The only real downside is data control (ie - the company has less control over how a developer manages assets like source code).

cryptonector1y ago

I've no problem with remote dev envs for most things. But they have to be VMs in many cases, not containers.

20751y ago

idunnoman12221y ago

Sounds like you are not using a lot of hardware - Rfid, POS, top-spec video cards, etc

neilv1y ago

Note that this was not something like DoD, nor HIPAA, nor finance. Just cutting-edge tech on which (ironically) we wanted first-mover advantage.

Unfortunately, it might not have been even the CEO's call, but a crazy investor.

speedisavirus1y ago

reissbaker1y ago

If it doesn't fit on one machine, though, you don't have another option: Meta, for example, will never have a local dev env for Instagram or Blue. Then you need to make some hard choices.

Personally, my ideal cloud dev env is:

2. Sync the code to the remote execution environment automatically, with hot-reloading.

3. Auto-port-forward from your local machine to the remote.

But yeah — if you don't need it, that's a lot of work and politics. Use local environments as long as you possibly can.

hintymad1y ago

> Personally - just let the developer own the machine they use for development.

It'll work if the company can offer something similar to EC2. Unfortunately most of the companies are not capable of doing so if they are not on cloud.

jt21901y ago

I’m not sure we should leap from:

> I have seen several attempts to move dev environments to a remote host. They invariably suck.

Edit: In a world where users demand that companies implement excellent security then we must allow those same companies to limit physical access to their machines as much as possible.

horsawlarway1y ago

But they don't suck because of lack of effort - they suck because there are real physical constraints.

And that's before we even touch the "My machine is down/unreachable, I don't know why and I have no visibility into resolving the issue, when was my last commit again?" style problems...

> Edit: In a world where users demand that companies implement excellent security then we must allow those same companies to limit physical access to their machines as much as possible.

2 more replies

brunoborges1y ago

> Personally - just let the developer own the machine they use for development.

I wonder if Microsoft's approach for Dev Box is the right one.

codethief1y ago

Could you elaborate on what that approach is?

to11mtm1y ago

nixdev1y ago

> Personally - just let the developer own the machine they use for development.

Commenters on this site are generally expected to be in a different category. Just wanted to share that, as a seasoned DevOps pro, I can tell you it's pretty rough out there.

sgarland1y ago

1 more reply

haolez1y ago

__MatrixMan__1y ago

When doing this re-evaluation, please consider that others might be quietly working very hard to discover and recreate locally whatever secret sauce you and production share.

ok_computer1y ago

I could also be a slob or just not working at the bleeding edge of python lib deployment tho so take it with a grain of salt.

ctippett1y ago

I'm not going to second-guess what works for you, but Python makes it so easy to work with an ephemeral environment.

  python -m venv .venv

1 more reply

lolinder1y ago

> This is the story of how (not) to build development environments in the cloud.

kitd1y ago

Agreed. It's actually a very interesting use case and I can easily see that K8s wouldn't be the answer. My dev env is very definitely my "pet", thank you very much!

ethbr11y ago

It'd be nice to editorialize the title a bit with "... (for dev envs)" for clarity.

Which makes sense, given k8s roots in (a) not being a security isolation tool & (b) targeting up-front configurability over runtime flexibility.

Neither of which mesh well with the co-hosted dev environment use case.

preommr1y ago

Can someone clarify if they mean development environments, or if they're talking about a service that they sell that's related to development environments.

If it's the latter, then this is incredibly niche and maybe interesting, but unlikely to be applicable to anyone else.

wutwutwat1y ago

4th paragraph in if you read the article…

> This is the story of how (not) to build development environments in the cloud.

1 more reply

bittermandel1y ago

It's for running their commercial products, which are stateful and long-lived developer environments.

ensignavenger1y ago

Perhaps a followup article will go into detail about their replacement.

loujaybee1y ago

Yeah, that's fair. The blog was getting quite long, so we need to do some deeper dives in follow-ups.

Gitpod Flex is runner-based. The runner interface is intentionally generic so that we can support different clouds, on-prem or just Linux in future.

There’s a bit more information in the docs, but we will do some follow ups!

- https://www.gitpod.io/docs/flex/runners/aws/setup-aws-runner... - https://www.gitpod.io/docs/flex/gitpod-desktop

(I work at Gitpod)

nickstinemates1y ago

Echoing the parent you're replying to. You built up all of the context and missed they payoff.

1 more reply

Bombthecat1y ago

Still No idea what you did technically... Maybe a second post?

Did you use consul?

1 more reply

ensignavenger1y ago

Awesome, looking forward to hearing more. I only recently began testing out Theia and OpenVSCodeServer, I really appreciate Gitpod's contributions to open source!

weikju1y ago

What’s a “runner”?

1 more reply

concerndc1tizen1y ago

Sounds more to me like they need a new CTO.

And that they're desperate to tell customers that they've fixed their problems.

Kubernetes is absolutely the wrong tool for this use case, and I argue that this should be obvious to someone in a CTO-level position, or their immediate advisors.

This is not at all what Gitpod needs. They need high performance disks, ballooning memory, live migrations, and isolated workloads.

Kubernetes does not provide you sufficient security boundaries for untrusted workloads. You need virtualization for that, and ideally physically separate machines.

Another major mistake they made was trying to build this on public cloud infrastructure. Of course the performance will be ridiculous.

However, one major reason for using Kubernetes is sharing the GPU. That is, to my knowledge, not possible with virtualization. But again, do you want to risk sharing your data, on a shared GPU?

bittermandel1y ago

santiagobasulto1y ago

> The difference here is Gitpod being stateful, which is notoriously hard on Kubernetes, though easier now than ever before!

We've always had issues with stateful kubernetes setups. Can you share what makes it easier today than before? Genuinely interested.

concerndc1tizen1y ago

What Neon is doing is quite a feat: Live migration (of a VM) while preserving TCP connections. It also took a lot of customization to achieve that.

But I agree that Kubernetes can indeed be used this way.

If anything, it further cements my original point about the Gitpod leadership.

The problem was never Kubernetes, but the dimwitted notion of using containers.

And then blaming Kubernetes for it: We're leaving you.

dilyevsky1y ago

concerndc1tizen1y ago

True! I love the idea of using K8s to orchestrate the running of VMs. With graceful shutdown and distributed storage, it makes it even more trivial to semi-live migrate VMs.

Are you aware of the limits? It must run as root and privileged?

1 more reply

ed_mercer1y ago

Why would you say that performance is bad on public cloud infrastructure?

concerndc1tizen1y ago

Example: What performance do you get out of your NVMe disks? Because these days you can build storage that delivers 100-200 GB/s.

https://www.graidtech.com/wp-content/uploads/2023/04/Results...

I bet few public cloud customers are seeing that kind of performance.

1 more reply

rahen1y ago

Kubernetes works great for stateless workloads.

For anything stateful, monolithic, or that doesn't require autoscaling, I find LXC more appropriate:

- it can be clusterized (LXD/Incus), like K8S but unlike Compose

- it exposes some tooling to the data plane, especially a load balancer, like K8S

- it offers system instances with a complete distribution and a init system, like a VM but unlike a Docker container

- it can orchestrate both VMs (including Windows VMs) and LXC containers at the same time in the same cluster

- LXC containers have the same performance as Docker containers unlike a VM

- it uses a declarative syntax

- it can be used as a foundation layer for anything stateful or stateless, including the Kubernetes cluster

belthesar1y ago

eddyg1y ago

Thanks for pointing out `incus-compose`!

OneCricketeer1y ago

Nomad cannot orchestrate what, exactly? https://developer.hashicorp.com/nomad/tutorials/plugins/plug...

xyst1y ago

I do agree with the points in article that k8s is not a good fit for development environments.

In my opinion, k8s is great for stable and consistent deployment/orchestration of applications. Dev environments by default are in a constant state of flux.

I don’t understand the need for “cloud development environments” though. Isn’t the point of containerized apps is to avoid the need for synchronizing dev envs amongst teams?

Or maybe this product is supposed to decrease onboarding friction?

sofixa1y ago

loujaybee1y ago

roshbhatia1y ago

dikei1y ago

Sarcastically, CDE is one way to move cost from CAPEX (get your developer a Mac Book Pro) to OPEX (a monthly subscription that you only need to pay as long as the dev has not been lay off)

It's also much cheaper to hire contractors and give them the CDE that can be terminated on a moment notice.

gloosx1y ago

  >Kubernetes seems like the obvious choice for building out remote, standardized and automated development environments

- Is it really Obvious Choice™ though Fred?

- Hmm, let's consult the graphs.

  >Kubernetes is a container orchestration system for automating software deployment.

- It's about automating deployment Carl, not development environments!

  >Kubernetes is not the right choice for building development environments, as we’ve found.

datadeft1y ago

Based on this information, it is hard to justify to even consider k8s for the problem that gitpod has.

junkaccount1y ago

Thanks for reading the paper!

datadeft1y ago

For those who are interested:

https://static.googleusercontent.com/media/research.google.c...

I am not sure what differences k8s has compare to Borg. At the concept level these are pretty comparable.

geoctl1y ago

abofh1y ago

I feel like anyone who was building a CI solution to sell to others and chose kubernetes didn't really understand the problem.

geoctl1y ago

abofh1y ago

1 more reply

merb1y ago

clvx1y ago

I ended up with a mix of nix and it's vm build system which is based on qemu. The issue is too tied to NixOS and all services run in the same place which forces you to manage ports and other things.

eptcyka1y ago

Have you tried https://github.com/astro/microvm.nix ? You can use the same NixOS module for both declarative VMs and imperatively configured and spawned VMs.

candiddevmike1y ago

> the fact you have to be dealing with a set of containers that could change if the base layer changed meant instability

Can you expand on this? Are you talking about containers you create?

rekoros1y ago

andreweggleston1y ago

aliasxneo1y ago

I think this approach works best in small teams where everyone agrees to drink the Nix juice. Otherwise, it's caused nothing but strife in my company.

rekoros1y ago

This may be the one area where some form of autocracy has merit :-)

1 more reply

debarshri1y ago

Having said that, I still think k8s is an upgrade when you have a large team.

eichi1y ago

Kubernetes is just combined infra admin practices. Whether we use it or not, we need to do the same things by local oriented way or vendor specific way .

1. Some operations on remote in local oriented way are time consuming and unmanageable.

2. With vendor specific way, our skill would be deprecated, having dependency to the vendors.

3. Kubernetes is not the best tools but it it popular.

As always, custom solution is the most powerful but should be replaced with more unified way for the stability of the development.

alecfong1y ago

pphysch1y ago

cryptica1y ago

junkaccount1y ago

The real reason for this shift is that kubernetes moved to containerd which they cannot handle. Docker was much easier. Differential workloads is not correct to blame.

Also, there is a long tail of issues to be fixed if you do it with Kubernetes.

Kubernetes does not just give you scaling, it gives you many things: run on any architecture, be close to your deployment etc.

moondev1y ago

https://github.com/Mirantis/cri-dockerd

junkaccount1y ago

Most of the kubernetes providers (GKE, EKS) do not support this new shim. Even on baremetal it is possibly hard to run.

linuxftw1y ago

lmeyerov1y ago

I was intrigued because the development environment problem is similar to the data scientist one - data gravity, GPU sharing, etc - but I'm confused on the solution?

Oddly, I left with a funny alternate takeaway: One by one, their clever inhouse tweaks & scheduling preferences were recognized by the community and turned into standard k8s knobs

csweichel1y ago

Fundamentally, Kubernetes simply isn't the right choice here. It's possible to make it work, but at some point the ROI of running on Kubernetes simply isn't there.

lmeyerov1y ago

Thanks!

AFAICT, a lot of that comes down to storage abstractions, which I'll be curious to see the answer on! Pinned localstorage <> cloud native is frustrating.

thenaturalist1y ago

Bottom of the post.

vrnvu1y ago

For development, I made the switch to nix/flox and it’s been a game-changer.

mstrangfeld1y ago

vrnvu1y ago

It works pretty well for most common packages, if a rare package/dependency fails is mostly nix problem to solve upstream.

4WIW1y ago

javier_e061y ago

The article is an excellent cautionary tale. Debugging an app in a container is one thing. Debugging and app running inside a Kubernetes node is a rabbit hole that demands more hours and expertise.

bluelightning2k1y ago

The debate in the comments about whether you should run locally is fascinating.

To the people saying ultra modern hardware could handle it: worth remembering the companies on question started on this path X years ago with Y set of technologies and Z set of experiences.

Because it made sense for Google in 2012 or whatever doesn't necessarily mean they would choose it again --or not-- given a do over (but there's basically no way back).

Jack0081y ago

hintymad1y ago

cheptsov1y ago

[1] https://github.com/dstackai/dstack

Muhtasham1y ago

I really like dstack, keep up the great work

deepsun1y ago

> SSD RAID 0

> A simpler version of this setup is to use a single SSD attached to the node. This approach provides lower IOPS and bandwidth, and still binds the data to individual nodes.

Are you sure SSD is that slow? NVMe devices are so fast that I hardly believe there's any need for RAID 0.

mikeshi421y ago

dwroberts1y ago

> Autoscaler plugins: In June 2022, we switched to using cluster-autoscaler plugins when they were introduced.

Does anyone have any links for cluster-autoscaler plugins? Searching drawing a blank, even in the cluster-autoscaler repo itself. Did this concept get ditched/removed?

riiii1y ago

> development environments

Kubernetes has never ever struck me as a good idea for a development environment. I'm surprised it took the author this long to figure out.

K8s can be a lifesaver for production, staging, testing, ... depending on your requirements and infrastructure.

sethammons1y ago

Sounds sane. Am i missing anything?

mrbluecoat1y ago

> Kubernetes is immensely challenging as a development environment platform

Glad someone said it out loud. So true. Apptainer has been a far better development experience for us.

bhouston1y ago

The infrastructure now incredibly understandable and simple and cost effective.

kbolino1y ago

Imustaskforhelp1y ago

wow very very interesting. I think we can discuss about it on hours.

1.) What would you think of things like hetzner / linode / digitalocean (if stable work exists)

2.) What do you think of https://sst.dev/ or https://encore.dev/ ? (They support rather easier migration)

Looking forward to chatting with you!

elcomet1y ago

Aren't you afraid of being now stuck with GCP?

bhouston1y ago

It is just a bunch of docker containers. Some run in tasks and some run as auto-scaling services. Would probably take a week to switch to AWS as there are equivalent managed services there.

1 more reply

sofixa1y ago

One of Cloud Run's main advantages is that it's literally just telling it how to run containers. You could run those same containers in OpenFaaS, Lambda, etc relatively easily.

rglover1y ago

What stack are you deploying?

bhouston1y ago

Stuff like this, just at larger scale:

https://github.com/bhouston/template-typescript-monorepo

This is my living template of best practices.

1 more reply

candiddevmike1y ago

You know that Cloud Run is effectively a Kubernetes PaaS, right?

richards1y ago

1 more reply

chanux1y ago

I guess the point is that for the OP, Kubernetes is now someone else's problem.

bhouston1y ago

> You know that Cloud Run is a Kubernetes PaaS, right?

Yup. Isn't it Knative Serving or a home grown Google alternative to it? https://knative.dev/docs/serving/

1 more reply

eYrKEC21y ago

Have folks seen success with https://earthly.dev/ as a tool in their dev cycle?

tacone1y ago

On a side note: has anybody experience with MicroK8s? I'd love to learn stories about it. I'm interested in both dev and production experiences.

redrove1y ago

Microk8s is nothing but a kubernetes distro from canonical. Personally I would use k3s because it’s a little more widespread and less opinionated in a good way.

Anyway, as always it depends on what you want to use it for.

eadem1y ago

The cloud maker is the answer of all this. qbo.io

remram1y ago

Hi Alex Diaz from qbo, can you stop spamming links to your website all over the net? At least elaborate.

ncrmro1y ago

We started having a few developers have constant VSCode timeouts. We switched to GitHub devcontainers which have been great.

remram1y ago

Damn those are really good features they could have contributed to Kubernetes.

vbezhenar1y ago

I read this article and I still don't understand what's wrong with Kubernetes for this task. Everything you would do with virtual machines could be done with Kubernetes with very similar results.

I guess team just wants to rewrite everything, it happens. Manager should prevent that.

eadem1y ago

The cloud Maker is the answer to all this. qbo.io

rohitghumare1y ago

You just simplified Kubernetes Management System

myestery1y ago

Leaving this comment here so I'll always come back to read this as someone who was considering kubernetes for a platform like gitpod

teach1y ago

Remember that you can favorite posts.

mleonhard1y ago

Why don't they describe their new system? I feel disappointed. :(

j / k navigate · click thread line to collapse