The latter takes a few seconds, the former is presumably longer. This is the great relevation of this blog post.
That's like saying waking your MacbookPro is faster than booting from powered off state. Of course it is, and that's precisely why the option exists.
Though there's a long tail, so sometimes there can be a gap on the order of a couple second between the sync response and when the bootloader transfers over to the kernel.
I wonder if Amazon would ever decide to offer booting the same image with the same hypervisor in EC2 as they do for lambdas?
This is why CodeSandbox, Namespace, and even fly.io built special-purpose architectures to guarantee extremely start-up time.
In the case of Namespace it's ~2sec on cold boots with a set of user-supplied containers, with storage allocations.
(Disclaimer, I'm with Namespace -- https://namespace.so)
https://docs.aws.amazon.com/codebuild/latest/userguide/actio...
On kraft.cloud we've fundamentally redesigned the cloud stack to be able to cold start containers/VMs in milliseconds (eg, about 20 millis for nginx, about 50 millis for a basic Node app), and also scale them to zero and autoscale them in milliseconds. If interested there's more info about the tech in our blog posts https://unikraft.io/blog/ .
[1]: https://justingarrison.com/blog/2024-02-08-fargate-is-not-fi...
I wonder if I should try out Illumos as I am rebuilding my home server, but I am afraid that due to lack of time, it'd take me ages to replicate my libvirt-based setup with around 30 services.
How well are Linux containers and VMs supported in Illumos? How about nested KVM? That could help with the transition as I am heavily leaning into GNU tools and KVM.
FWIW, when I first laid my eyes on a Sun workstation back in the 90s and seen the Control key where it rightly belongs (in the place of Caps Lock in broken layouts :), I said "duh" and have never moved back from remapped Caps-as-Ctrl.
It's complicated but you can blame Sun for a lot of that, the primary thing was their explicit choice of license to make it incompatible (or hard-to-make-compatible with Linux).
But! Autoscaling serves two purposes. One is to address load spikes. The other is to reduce costs with scaling down. What this solution does is trade off some of the cost savings by prewarming the EBS volumes and then paying for them.
This feels like a reasonable tradeoff if you can justify the cost with better auto-scaling.
And if you're not autoscaling, it's still worth the cost if the trade off is having your engineers wait around for instance boots.
Are customers that are so cost sensitive served well by public clouds and attendant infrastructure?
They also do proactive scaling to address the issues you brought up, since they can predict with fairly high accuracy the normal viewing patterns.
For example scaling up on Saturday morning ahead of all the kids waking up.
Small nit, and this doesn't detract from your points. I don't think this is universally true by definition, even if it is almost always true. You could come up with some rare conditions where your traffic at t+5 minutes is actually easier to predict than at t+20 seconds. Of course, even in that case you're better off (ceteris paribus) being able to spin things up in 20 seconds.
[0] For example I can tell you exactly when spikes will happen to Netflix's servers on Saturday morning (because the kids all get up at the same time). And I can tell you there will be spikes on the hour during prime time as people shift from linear TV to streaming (or at least they did a lot more 10 years ago!). I can also tell you when spikes to Alexa will be because I already know what times peoples alarms are set for.
This dang age everyone should use autoscaling. relying on ODCR (capacity reservations) for guaranteeing resources exist.
No shortcuts really, although a lot of web applications behave “kinda” similar.
Start conservatively and tweak from there.
Do we actually need all this stuff, or does it suffice to get one really powerful server (price less than $40k) and run Docker on it?
GitHub actions in the standard setup needs to run untrusted code and so you essentially need a VM.
You can lock it down at the cost of sacrificing features and usability, but that's a tradeoff.
On kraft.cloud (shameless plug) we build extremely specialized VMs (aka unikernels) where most of the code in them is the application code, and pair this with a fast, custom controller and other perf tweaks. We use Dockerfiles to build from, but when deploying we eliminate all of those layers you mention. Cold boot times are in milliseconds (e.g., nginx 20ms, a basic node app ~50ms), as are scale to zero and autoscale.
Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.
1. Docker. Adds a few hundred msec of startup time to containers, configuration complexity, daemons, disk caches to manage, repositories .... a lot of stuff. In rigorously controlled corp environments it's not needed. You can just have a base OS distro that's managed centrally and tell people to target it. If they're building on e.g. the JVM then Docker isn't adding much. I don't use it on my own companies CI cluster for example, it's just raw TeamCity agents on raw machines.
2. VMs. Clouds need them because they don't trust the Linux kernel to isolate customers from each other, and they want to buy the biggest machines possible and then subdivide them. That's how their business model works. You can solve this a few ways. One is something like Firecracker where they make a super bare bones VM. Another would be to make a super-hardened version of Linux, so hardened people trust it to provide inter-tenant isolation. Another way would be a clean room kernel designed for security from day one (e.g. written in Rust, Java or C#?)
3. Drives on a distributed network. Honestly not sure why this is needed. For CI runners entirely ephemeral VMs running off read only root drive images should be fine. They could swap to local NVMe storage. I think the big clouds don't always like to offer this because they have a lot of machines with no local storage whatsoever, as that increases the density and allows storage aggregation/binpacking, which lowers their costs.
Basically a big driver of overheads is that people want to be in the big clouds because it avoids the need to do long term planning or commit capital spend to CI, but the cloud is so popular that providers want to pack everyone in as tightly as possible which requires strong isolation and the need to avoid arbitrary boundaries caused by physical hardware shapes.
If you know who's using your build server, you probably don't need isolation stronger than Docker, because they can to to jail for hacking it.
Do you have an example image and network config that would demonstrate that?
(I'd love to understand the performance limits of Docker containers, but never played with them deeply enough since they are usually in >1s space which is too slow for me to care)
Underneath, we use specialized VMs (unikernels), a custom controller and load balancer, as well as a number of perf tweaks to achieve this. But it's (now) certainly possible.
I mean an ass end M3 macbook has the same compile time as an i9-14900k. God knows what an equivalent Xeon/Epyc costs...
ISTM one could do much better with an immutable/atomic setup: set up an immutable read-only EBS volume, and have each instance share that volume and have a per-instance volume that starts out blank.
Actually pulling this off looks like it would be limited by the rules of EBS Multi-Attach. One could have fun experimenting with an extremely minimal boot AMI that streams a squashfs or similar file from S3 and unpacks it.
edit: contemplating a bit, unless you are willing to babysit your deployment and operate under serious constraints, EBS multi-attach looks like the wrong solution. I think the right approach would be build a very very small AMI that sets up a rootfs using s3fs or a similar technology and optionally puts an overlayfs on top. Alternatively, it could set up a block device backed by an S3 file and optionally use it as a base layer of a device-mapper stack. There’s plenty of room to optimize this.
But the bigger issue might be durability. Most EBS types have rather low quoted durability, and, for a shared volume like this, that’s a problem. Using S3 instead would be better all around except for the smallish engineering effort and deployment effort needed.
Getting a tool like mkosi to generate a boot-from-S3 setup should be straightforward. Converting most any bootable container should also be doable, even automatically. Converting an AMI would involve more heuristics and be more fragile, but it ought to work reliably with most modern Linux distros.
Fundamentally, EBS volumes are designed to work more or less like actual disks, whereas modern scaled workloads want something closer to net-booted diskless machines.
We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"
We hit this problem with containers too - we'd _love_ to just run all our CI on something like fargate and have it automatically scale and respond to our demand, but the response times and rate limting are just _so slow_ that it means instead we just end up starting/stopping instances with a lambda which feels so 2014.
Change that to "here's the ISO/IMG I want to run & autoscale, can you please do it faster than AWS?" and you'll have tons of options. Most platforms using Firecracker would most likely be faster, maybe try to use that as a search vector.
Fly.io comes up often [0] on HN, but there's an overwhelming amount of "it's a nice idea, but it just doesn't work" feedback on it.
> including running your tests, "thankfully", we use maven which means that our tests are part of the build lifecycle. It's a bit annoying because our CI provider has some neat parallelism stuff that we could lean on if we could separate out the test phase from the build phase. We use docker-compose inside our builders for dev dependencies (we run our tests against a real database running in docker) but I think they should be our only major issues here.
But, Thanks for the heads up.
> While this sounds like it would serve our needs, autoscaling groups are very slow to react to incoming requests to scale up. From experimentation, it appears that autoscaling groups may have a slow poll loop that checks if new instances are needed, so the delay between requesting a scale up and the instance starting can exceed 60 seconds. For us, this negates the benefit of the warm pool.
I pulled this from the article, but it's the same problem. Technically yes, eks + fargate works. In practice the response times from "thing added to queue" to "node is responding" is minutes with that setup.
Our game code is in P4, but our backend services are on GH. Having a single CI system means we get easy interop e.g. game updates can trigger backend pipelines and vice versa.
In the past I've used TeamCity, Jenkins, and ElectricCommander(!)
Recently I decided to actually look at boot times since I store in the db when the servers are requested and when they become ready and it turns out for me it's really bi-modal; some take about 15-20s and many take about 80s, see graph https://x.com/sadservers_com/status/1782081065672118367
Pretty baffled by this (same region, same pretty much everything), any idea why?. Definitively going to try this trick in the article.
The second and third spikes at 80 and 140 seconds lines up nicely with this kind of behavior.
The second spike would be optimised workloads that can respond to spot interruption in under 60 seconds.
The third spike would be Spot workloads that are being force-terminated.
The reason it's falling on those bounds is because of whatever is trying to schedule your workload only re-checks for free capacity once a minute.
I used to be able to spin up spot instances and basically never get interruptions. They'd stay on for weeks/months.
In my experience, it used to be fairly safe to have Spot instances for most workloads. You'd almost never get Spot interruptions. Now, some regions and instance types are difficult to run Spot instances at all.
Almost all my ec2 instances are spot, and actually I can compare the distribution with the on-demand ones.
My spot instances are very short lived (15-30 mins max) and AFAIK I've never seen a spot instance force-terminated (this would be hard to find I think).
Often the technology is the easier part.
The difficult part is how to name the feature intuitively, adding to an ocean of jargon and documentation, and making the configuration knobs intuitive both in UI and CLI/SDK.
Amazon Simple Compute Service :) ?
That said, I think this is a problem they could likely solve with that functionality, and we'd love to use it.
[0] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-au...
[0]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-sn...
After having typed all that out, I recalled that Open Stack exists and thus they may get you ever further toward your goal since they are trying to be on-prem AWS: https://docs.openstack.org/auto-scaling-sig/latest/theory-of...
It looks like their use-case fetches all the data it needs from the network (in the form of the GH Actions runner getting the job from GitHub, and then pulling down Docker containers, etc).
What they need is a minimal Linux install (Arch Linux would be good for this) in a squashfs/etc and the only thing in EBS should be an HTTP-aware boot loader like IPXE or a kernel+initrd capable of pulling down the squashfs from S3 and run it from memory. Local "scratchspace" storage for the build jobs can be provided by the ephemeral NVME drives which are also direct-attach and much faster than EBS.
>If AWS responds that there is no current capacity for m7a instances, the instance is updated to a backup type (like m7i) and started again
Any ideas why m7i would be chosen as the backup type rather than the other way around? m7a seems to be more expensive than m7i, so maybe there's some performance advantage or something else I'm missing that makes AMD CPU containing instances preferable to Intel ones?
It seems like performance wise the AMD processors are (in certain workloads) quite a bit faster than their Intel equivalent: https://www.phoronix.com/review/aws-m7a-ec2-benchmarks/2 (in later pages it seems to be a little bit more mixed)
Shutdown standbys absolutely the way to do it.
Does AWS offer anything for this, because it's very tedious to set this up.
Snapshots are persisted into S3 (transparently to the user) but it means each new EBS volume spawned doesn't start at full IOPS allocation.
I presume this is due to EBS volumes being specific-AZ so to be able to launch an AMI-seeded EBS volume in any AZ it needs to go via S3 (multi-AZ)
I'd guess it's likely that EBS is using a tiered caching system, where they'll keep live volumes around for Copy-on-write cloning for the more popular images/snapshots, with slightly less popular images maybe stored in an EBS cache of some form, before it goes all the way back to S3. You're just not likely to end up getting a live volume level of caching until you hit a certain threshold of launches.
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...
It’s not cheap, but it speeds things up.
I make a similar product offering fast Github actions runners[1] and we've been down this rabbit hole of boot time optimization.
Eventually, we realized that the best solution is to actually build scale. There are two factors in your favor then: 1) Spikes are less pronounced and the workloads are a lot more predictable. 2) The predictability means that you have a decent estimate of the workload to expect at any given time, within reason for maintaining an efficient warm pool.
This enables us to simplify the stack and not have high-maintenance optimizations while delivering great user experience.
We have some pretty heavy use customers that enable us to do this.
This. I went the same route with regards to boot time optimisations for [1] (cleaning up the AMI, cloud-init, etc.), and can boot a VM from cold in 15s (I can't rely on prewarming pools of machines -- even stopped -- since RunsOn doesn't share machines with multiple clients and this would not make sense economically).
But the time taken by the official runner binary to load and then get assigned a job by GitHub always takes around 8s, which is more than half of the VM boot time :( At some point it would be great if GitHub could give us a leaner runner binary with less legacy stuff, and tailored for ephemeral runners (that, or reverse-engineer the protocol).
Surely an optimized approach here looks something like booting customer CI workloads directly from the hypervisor, using an ISO/squashfs/etc. stored directly on the hypervisor, where the only networked disks are the ones with the customers' BuildKit caches?
You can launch a stripped down distribution with what, a 200mb disk? Then attach the “useful” EBS volume, and “do stuff” with that - launch a container, or whatever.
the server broadcasts at 200 MB/s[2]. the whole setup costs me $3-4 usd/hour and by far the slowest part of boot is my game compiling on the central server, whether i store ccache data in s3 or not. i've booted this every day for the last 6 months, to test the game.
if your system can't handle 30s vm boots, your system should improve.
their own tldr should be at the top not middle of the article :)