Making EC2 boot time faster (opens in new tab)

Hetzner does not offer a network block storage comparable to EBS that can be used as a root (bootable) file system. AWS local-attached ephemeral disk are also immediately available but cannot be seeded with data (same as Hetzner they are wiped clean ahead of boot).

andersa2y ago

This is an advantage. EBS is terrible! Literally orders of magnitude slower than modern SSDs.

At least if you're using the Ec2 optimized amis, Ec2 instances frequently boot fast enough that they'll be executing your username initialization before you get the run instances response.

Though there's a long tail, so sometimes there can be a gap on the order of a couple second between the sync response and when the bootloader transfers over to the kernel.

tekla2y ago

They have and I know this because I've hammered them on this because we demand thousands of instances to autoscale very aggressively in 1-3 minutes. Very few people give a shit about initialization times. They care more about instance ready times which is constrained by the OS that is running.

attentive2y ago

It depends on instance type and OS and can be real short on ec2.

matt-p2y ago

What's size got to do with boot time? Serious question.

develatio2y ago

By "the size" I meant to say "the size of the infrastructure", meaning that AWS has to manage orders of magnitude more instances than Hetzner. This might as well contribute to "things" being slower.

RationPhantoms2y ago

More employed eyes on an issue or ability to compensate the best-in-class engineers to take a look.

ldoughty2y ago

The article mentions hydration (a process to reduce the penalty of first access of a data block). This is done by essentially reading the entire EBS volume with the tool like fio or dd. The time it takes to complete this process is relative to the amount of dirty blocks. Therefore, it will take twice as long to hydrate 20 GB of data compared to 10 GB.

playingalong2y ago

Likely they mean that following Conway's law in AWS there are more abstraction layers involved.

CaptainOfCoit2y ago

Smaller companies are faster and more nimble than larger corporations.

https://docs.aws.amazon.com/codebuild/latest/userguide/actio...

necovek2y ago· 11 in thread

From a technical perspective, Amazon has actually optimized this but turned that into "serverless functions": their ultra-optimized image paired with Firecracker achieve ultra-fast boot-up of virtual Linux machines. IIRC from when Firecracker was being introduced, they are booting up in sub-second times.

I wonder if Amazon would ever decide to offer booting the same image with the same hypervisor in EC2 as they do for lambdas?

20thr2y ago

100% -- EC2's general purpose nature is not in my opinion the best fit for ephemeral use-cases. You'll be constantly fighting the infrastructure as the set of trade-offs and design goals are widely different.

This is why CodeSandbox, Namespace, and even fly.io built special-purpose architectures to guarantee extremely start-up time.

In the case of Namespace it's ~2sec on cold boots with a set of user-supplied containers, with storage allocations.

(Disclaimer, I'm with Namespace -- https://namespace.so)

arianvanp2y ago

And AWS now has a product to spin up Lambdas for GitHub Actions CI runners

fhuici2y ago

[Disclaimer: I'm with KraftCloud] For what it's worth, Firecracker/the VMM is only one part of the boot process. Among others, there's also the controller and the VM/OS itself that typically slow things down. In other words, it's not enough to just switch in Firecracker and expect cold starts to immediately drop to sub-second levels.

On kraft.cloud we've fundamentally redesigned the cloud stack to be able to cold start containers/VMs in milliseconds (eg, about 20 millis for nginx, about 50 millis for a basic Node app), and also scale them to zero and autoscale them in milliseconds. If interested there's more info about the tech in our blog posts https://unikraft.io/blog/ .

cr125rider2y ago

Fargate is an alternative that runs on Firecracker as well. It's hidden behind ECS and EKS, however.

mochomocha2y ago

According to [1] Fargate is actually not using Firecracker, but probably something closer to a single container running in a single-tenant ec2 VM. If true, this makes VM boot-time optimizations and warm pooling even more important for such product.

[1]: https://justingarrison.com/blog/2024-02-08-fargate-is-not-fi...

plopz2y ago

Fargate is too slow without the container cache you can get with ec2

scarface_742y ago

And CodeBuild…

solarengineer2y ago

Not an AWS example, but on my Illumos Zones on an i5 at Hetzner, I get from zero to ssh in under 50 ms. I am certain of the numbers since I have used DTrace to measure. It is unfortunate that Ilumos is not popular enough for a multitude of reasons.

necovek2y ago

Wow, that's amazing!

I wonder if I should try out Illumos as I am rebuilding my home server, but I am afraid that due to lack of time, it'd take me ages to replicate my libvirt-based setup with around 30 services.

How well are Linux containers and VMs supported in Illumos? How about nested KVM? That could help with the transition as I am heavily leaning into GNU tools and KVM.

FWIW, when I first laid my eyes on a Sun workstation back in the 90s and seen the Control key where it rightly belongs (in the place of Caps Lock in broken layouts :), I said "duh" and have never moved back from remapped Caps-as-Ctrl.

oblio2y ago

> It is unfortunate that Ilumos is not popular enough for a multitude of reasons.

It's complicated but you can blame Sun for a lot of that, the primary thing was their explicit choice of license to make it incompatible (or hard-to-make-compatible with Linux).

solarengineer2y ago

I should add - this is a full Zone with a number of services. One of my half-finished projects was to trace the call graph of the various services started and disable those not needed.

jedberg2y ago· 10 in thread

Boot time is the number one factor in your success with auto-scaling. The smaller your boot time, the smaller your prediction window needs to be. Ex. If your boot time is five minutes, you need to predict what your traffic will be in five minutes, but if you can boot in 20 seconds, you only need to predict 20 seconds ahead. By definition your predictions will be more accurate the smaller the window is.

But! Autoscaling serves two purposes. One is to address load spikes. The other is to reduce costs with scaling down. What this solution does is trade off some of the cost savings by prewarming the EBS volumes and then paying for them.

This feels like a reasonable tradeoff if you can justify the cost with better auto-scaling.

And if you're not autoscaling, it's still worth the cost if the trade off is having your engineers wait around for instance boots.

fhuici2y ago

Fully agree, doing reactive autoscaling when the actual boot time is slow is an inherently hard problem. We've done years of research into building specialized VMs (unikernels) and fast controllers to be able to provide infra that allows VMs/containers to cold start, and thus autoscale/scale to zero in milliseconds (eg, a simple Node app cold starts in ~50 ms). If interested, you can try it out at kraft.cloud, or check out info about the tech in our blogs (https://unikraft.io/blog/) or the corresponding LF OSS project (www.unikraft.org).

deivid2y ago

Unikraft is really cool, but Linux is not necessarily the blocker. You can boot to PID1 in firecracker in ~6ms, see my experiments: https://blog.davidv.dev/minimizing-linux-boot-times.html

pradn2y ago

I assume the majority of people are autoscaling merely for diurnal and weekly traffic swings, where the signal window could be as high as 30 min or 1 hour. Do folks really see sub-minutely autoscaling?

Are customers that are so cost sensitive served well by public clouds and attendant infrastructure?

jedberg2y ago

Netflix scales on short intervals for reliability. Scale up quickly to handle a spike in traffic, scale down slowly to save money.

They also do proactive scaling to address the issues you brought up, since they can predict with fairly high accuracy the normal viewing patterns.

For example scaling up on Saturday morning ahead of all the kids waking up.

sfilmeyer2y ago

>By definition your predictions will be more accurate the smaller the window is.

Small nit, and this doesn't detract from your points. I don't think this is universally true by definition, even if it is almost always true. You could come up with some rare conditions where your traffic at t+5 minutes is actually easier to predict than at t+20 seconds. Of course, even in that case you're better off (ceteris paribus) being able to spin things up in 20 seconds.

jedberg2y ago

I can come up with a lot of examples where it is easier to predict further out[0], but that also means I can predict them 20 seconds out. :)

[0] For example I can tell you exactly when spikes will happen to Netflix's servers on Saturday morning (because the kids all get up at the same time). And I can tell you there will be spikes on the hour during prime time as people shift from linear TV to streaming (or at least they did a lot more 10 years ago!). I can also tell you when spikes to Alexa will be because I already know what times peoples alarms are set for.

bushbaba2y ago

Autoscaling also makes performance insights easier. It keeps the resources per processed request relatively consistent over time. Whereas resizing not automatically can leads to a lot of operational complexity in understanding how your service will react under different loads.

This dang age everyone should use autoscaling. relying on ODCR (capacity reservations) for guaranteeing resources exist.

cricketlover2y ago

won't there be more noise while predicting just 20s in advance? The longer the duration, the less effects we will see of temporary events like network blips etc. no? sorry I'm new to software engineering and just trying to learn.

wongarsu2y ago

However with a smaller prediction interval you can dampen your autoscaling more. If you predict 20s into the future, react, and 20s later you see how that changed the situation you can afford to spin very few instances up and down each 20s. If you have to predict 5m into the future you might have to take much stronger actions because any effect is delayed by the 5m startup interval.

cyberpunk2y ago

There’s no one answer for it, you need to learn your traffic / resource usage patterns and tune the scaling to match your situation.

No shortcuts really, although a lot of web applications behave “kinda” similar.

Start conservatively and tweak from there.

immibis2y ago· 9 in thread

There's something to say about building a tower of abstractions and then trying to tear it back down. We used to just run a compiler on a machine. Startup time: 0.001 seconds. Then we'd run a Docker container on a machine. Startup time: 0.01 sections. Fine, if you need that abstraction. Now apparently we're booting full VMs to run compilers - startup time: 5 seconds. But that's not enough, because we're also allocating a bunch of resources in a distributed network - startup time: 40 seconds.

Do we actually need all this stuff, or does it suffice to get one really powerful server (price less than $40k) and run Docker on it?

iudqnolq2y ago

That doesn't solve the same problem.

GitHub actions in the standard setup needs to run untrusted code and so you essentially need a VM.

You can lock it down at the cost of sacrificing features and usability, but that's a tradeoff.

fhuici2y ago

We don't need all of those layers and abstractions of course. But if we do things right we also don't need to go the bare metal server route -- cloud platforms, if done right, can provide both strong, hardware-level (read: vm) isolation plus fast starts.

On kraft.cloud (shameless plug) we build extremely specialized VMs (aka unikernels) where most of the code in them is the application code, and pair this with a fast, custom controller and other perf tweaks. We use Dockerfiles to build from, but when deploying we eliminate all of those layers you mention. Cold boot times are in milliseconds (e.g., nginx 20ms, a basic node app ~50ms), as are scale to zero and autoscale.

mike_hearn2y ago

A really powerful server should not cost you anywhere near $40k unless you're renting bare metal in AWS or something like that.

Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.

1. Docker. Adds a few hundred msec of startup time to containers, configuration complexity, daemons, disk caches to manage, repositories .... a lot of stuff. In rigorously controlled corp environments it's not needed. You can just have a base OS distro that's managed centrally and tell people to target it. If they're building on e.g. the JVM then Docker isn't adding much. I don't use it on my own companies CI cluster for example, it's just raw TeamCity agents on raw machines.

2. VMs. Clouds need them because they don't trust the Linux kernel to isolate customers from each other, and they want to buy the biggest machines possible and then subdivide them. That's how their business model works. You can solve this a few ways. One is something like Firecracker where they make a super bare bones VM. Another would be to make a super-hardened version of Linux, so hardened people trust it to provide inter-tenant isolation. Another way would be a clean room kernel designed for security from day one (e.g. written in Rust, Java or C#?)

3. Drives on a distributed network. Honestly not sure why this is needed. For CI runners entirely ephemeral VMs running off read only root drive images should be fine. They could swap to local NVMe storage. I think the big clouds don't always like to offer this because they have a lot of machines with no local storage whatsoever, as that increases the density and allows storage aggregation/binpacking, which lowers their costs.

Basically a big driver of overheads is that people want to be in the big clouds because it avoids the need to do long term planning or commit capital spend to CI, but the cloud is so popular that providers want to pack everyone in as tightly as possible which requires strong isolation and the need to avoid arbitrary boundaries caused by physical hardware shapes.

immibis2y ago

$40k to buy the server, not to rent per month.

If you know who's using your build server, you probably don't need isolation stronger than Docker, because they can to to jail for hacking it.

necovek2y ago

How do you get Docker container startup time of 0.01s with any real-life workload (yes, I know they are just processes, so you could build a simple "hello world" thing, but I'd be surprised if even that runs this fast)?

Do you have an example image and network config that would demonstrate that?

(I'd love to understand the performance limits of Docker containers, but never played with them deeply enough since they are usually in >1s space which is too slow for me to care)

fhuici2y ago

On kraft.cloud we use Dockeffiles to build into extremely specialized VMs for deployment. With this in place, we can have say an nginx server cold started and ready to serve at a public URL in about 20 millis (not quite the 10ms you mention, but in the right ballpark, and we're constantly shaving that down). Heavier apps can take longer of course, but not too much (e.g., node/next < 100ms). Autoscale and scale to zero also operate in those timescales.

Underneath, we use specialized VMs (unikernels), a custom controller and load balancer, as well as a number of perf tweaks to achieve this. But it's (now) certainly possible.

cjk22y ago

I'm mostly just running the (Go) compiler on my laptop which is considerably faster than on docker and considerably cheaper than the server...

I mean an ass end M3 macbook has the same compile time as an i9-14900k. God knows what an equivalent Xeon/Epyc costs...

immibis2y ago

Maybe your container isn't set up right - Docker contains run directly on the host, just partitioned off from accessing stuff outside of themselves with the equivalent of chroot. Or it could be a Mac-specific thing. Docker only works that way on Linux, and has to emulate Linux on other platforms.

benwaffle2y ago

reminds me of https://world.hey.com/dhh/we-re-moving-continuous-integratio...

amluto2y ago· 8 in thread

I don’t use EC2 enough to have played with this, but a big part here is the population of the AMI into the per-instance EBS volume.

ISTM one could do much better with an immutable/atomic setup: set up an immutable read-only EBS volume, and have each instance share that volume and have a per-instance volume that starts out blank.

Actually pulling this off looks like it would be limited by the rules of EBS Multi-Attach. One could have fun experimenting with an extremely minimal boot AMI that streams a squashfs or similar file from S3 and unpacks it.

edit: contemplating a bit, unless you are willing to babysit your deployment and operate under serious constraints, EBS multi-attach looks like the wrong solution. I think the right approach would be build a very very small AMI that sets up a rootfs using s3fs or a similar technology and optionally puts an overlayfs on top. Alternatively, it could set up a block device backed by an S3 file and optionally use it as a base layer of a device-mapper stack. There’s plenty of room to optimize this.

Szpadel2y ago

we used s3fs in production. please don't use it, it's unreliable, unpredictable failure modes, can bring whole instance down. if you really need something like that use rclone mount

mdaniel2y ago

I believe they addressed this in their post because one cannot (currently?) `aws ec2 run-instances --volume-id vol-cafebabe`, rather one can only tell AWS what volume parameters to use when they create the root device. Your theory may still be sound about using some kind of super bare bones AMI but there will be no such outcome of "hey, friend, use this existing EBS as your root volume, don't create a new one"

[0] https://news.ycombinator.com/item?id=39363499

Isn’t EBS multi-attach only available for the (very expensive) io1 / io2 volume types?

amluto2y ago

Hmm, it does look like it, although one could carefully use large IO.

But the bigger issue might be durability. Most EBS types have rather low quoted durability, and, for a shared volume like this, that’s a problem. Using S3 instead would be better all around except for the smallish engineering effort and deployment effort needed.

Getting a tool like mkosi to generate a boot-from-S3 setup should be straightforward. Converting most any bootable container should also be doable, even automatically. Converting an AMI would involve more heuristics and be more fragile, but it ought to work reliably with most modern Linux distros.

attentive2y ago

That's reinventing ebs/ami/snapshots. They are already doing it i.e. data goes lazily from s3 to ebs/ec2.

amluto2y ago

It’s not, though. The way to boot a transient OS (just like a transient instance of a container on a machine/instance with a container runtime) is to give userspace read-only access to the image. It can be outright read-only or it can be an actual efficient overlay mechanism (qcow, overlayfs, device-mapper snapshot, etc). EBS, as the article notes, can’t actually do a read-only mount of a snapshot at all, and it’s very inefficient at instantiating a volume from a snapshot.

antihero2y ago

Could you make a snapshot of the booted instance then and boot other instances from that?

amluto2y ago

That seems like it would have exactly the same problem. The problem is that EBS volumes load very inefficiently from snapshots. (They’re also unnecessarily expensive: you pay for (number of instances times size) despite the fact that what you actually want is for each instance to read exactly the same data.)

Fundamentally, EBS volumes are designed to work more or less like actual disks, whereas modern scaled workloads want something closer to net-booted diskless machines.

maccard2y ago· 8 in thread

I don't use GHA as some of our code is stored in Perforce, but we've faced the same challenges with EC2 instance startup times on our self managed runners on a different provider.

We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"

We hit this problem with containers too - we'd _love_ to just run all our CI on something like fargate and have it automatically scale and respond to our demand, but the response times and rate limting are just _so slow_ that it means instead we just end up starting/stopping instances with a lambda which feels so 2014.

CaptainOfCoit2y ago

> We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"

Change that to "here's the ISO/IMG I want to run & autoscale, can you please do it faster than AWS?" and you'll have tons of options. Most platforms using Firecracker would most likely be faster, maybe try to use that as a search vector.

maccard2y ago

Can you maybe share some examples? We're fine to use other image formats, but a lot of the value of AWS is that the services interact, IAM works nicely together, etc.

Fly.io comes up often [0] on HN, but there's an overwhelming amount of "it's a nice idea, but it just doesn't work" feedback on it.

abatilo2y ago

Depot also does remote docker builds using a remote build kit agent. It was actually their original product. If you could feasibly put everything into a Dockerfile, including running your tests, then you could use that product and get the benefits.

maccard2y ago

I actually didn't know this. We've had some teething issues _building_ in docker, but we actually run our services in containers. I'm sure a few hours of banging my head against a wall would be worth it here.

> including running your tests, "thankfully", we use maven which means that our tests are part of the build lifecycle. It's a bit annoying because our CI provider has some neat parallelism stuff that we could lean on if we could separate out the test phase from the build phase. We use docker-compose inside our builders for dev dependencies (we run our tests against a real database running in docker) but I think they should be our only major issues here.

But, Thanks for the heads up.

Szpadel2y ago

I'm not fully investigated fargate limitations but I think it would be possible to use any k8s native CI on eks + fargate, maybe even use kubevirt for VM creation? from my exploration of fargate with eks, aws provisioned capacity in around 1s region

maccard2y ago

> AWS offers something very similar to this approach called warm pools for EC2 Auto Scaling. This allows you to define a certain number of EC2 instances inside an autoscaling group that are booted once, perform initialization, then shut down, and the autoscaling group will pull from this pool of compute first when scaling up.

> While this sounds like it would serve our needs, autoscaling groups are very slow to react to incoming requests to scale up. From experimentation, it appears that autoscaling groups may have a slow poll loop that checks if new instances are needed, so the delay between requesting a scale up and the instance starting can exceed 60 seconds. For us, this negates the benefit of the warm pool.

I pulled this from the article, but it's the same problem. Technically yes, eks + fargate works. In practice the response times from "thing added to queue" to "node is responding" is minutes with that setup.

Out of curiosity what CI system are you using with Perforce?

maccard2y ago

We use buildkite with a customised verison of https://github.com/improbable-eng/perforce-buildkite-plugin/

Our game code is in P4, but our backend services are on GH. Having a single CI system means we get easy interop e.g. game updates can trigger backend pipelines and vice versa.

In the past I've used TeamCity, Jenkins, and ElectricCommander(!)

fduran2y ago· 4 in thread

So I've created ~300k ec2 instances with SadServers and my experience was that starting an ec2 VM from stopped took ~30 seconds and creating one from AMI took ~50 seconds.

Recently I decided to actually look at boot times since I store in the db when the servers are requested and when they become ready and it turns out for me it's really bi-modal; some take about 15-20s and many take about 80s, see graph https://x.com/sadservers_com/status/1782081065672118367

Pretty baffled by this (same region, same pretty much everything), any idea why?. Definitively going to try this trick in the article.

paranoidrobot2y ago

My guess is probably related to AWS Spot capacity.

The second and third spikes at 80 and 140 seconds lines up nicely with this kind of behavior.

The second spike would be optimised workloads that can respond to spot interruption in under 60 seconds.

The third spike would be Spot workloads that are being force-terminated.

The reason it's falling on those bounds is because of whatever is trying to schedule your workload only re-checks for free capacity once a minute.

I used to be able to spin up spot instances and basically never get interruptions. They'd stay on for weeks/months.

In my experience, it used to be fairly safe to have Spot instances for most workloads. You'd almost never get Spot interruptions. Now, some regions and instance types are difficult to run Spot instances at all.

fduran2y ago

Thanks, pot capacity being scheduled differently would explain the behavior.

Almost all my ec2 instances are spot, and actually I can compare the distribution with the on-demand ones.

My spot instances are very short lived (15-30 mins max) and AFAIK I've never seen a spot instance force-terminated (this would be hard to find I think).

[0] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-au...

fletchowns2y ago

Perhaps in one case you are getting a slice of a machine that is already running, versus AWS powering up a machine that was offline and getting a slice of that one?

fduran2y ago

Yes, some internal (AWS operation) explanation like the one you suggest makes sense.

albert_e2y ago· 3 in thread

AWS will (should) make this an optional feature.

Often the technology is the easier part.

The difficult part is how to name the feature intuitively, adding to an ocean of jargon and documentation, and making the configuration knobs intuitive both in UI and CLI/SDK.

Amazon Simple Compute Service :) ?

kylegalbraith2y ago

Other founder of Depot here. AWS is pretty close to this idea with their Warm Pools [0]. But for our use case, they're just too slow to react to changes. We observed 60s+ to notice a change and actually start the machine. That doesn't work when we need to launch the machine as quickly as possible in reaction to a pending GHA job.

That said, I think this is a problem they could likely solve with that functionality, and we'd love to use it.

ldoughty2y ago

The article talks about fio to warm the drive... That's basically fast snapshot restore[0]. This reduces the "first access" penalty for "dirty" blocks. This is probably the slowest part of the entire article (it's about 10 seconds per dirty GB to fio the disk).

[0]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-sn...

immibis2y ago

AWS Lambda.

mnutt2y ago· 2 in thread

They talk about the limitations of the EC2 autoscaler and mention calling LaunchInstances themselves, but are there any autoscaler service projects for EC2 ASGs out there? The AWS-provided one is slow (as they mention), annoyingly opaque, and has all kinds of limitations like not being able to use Warm Pools with multiple instance types etc.

mdaniel2y ago

I am a little confused by your mention of "EC2 autoscaler" and then "EC2 ASG" autoscaler, but if I'm hearing you correctly and you'd want "self managed ASGs," then you may have some success adapting Keda <https://github.com/kedacore/keda#readme> (or your-favorite-event-driven-gizmo) to monitor the metrics that interest you and driving ec2.LaunchInstances on the other side, since as very best I can tell that's what ASGs are doing just using their serverless-event-something-or-other versus your serverless-event-something-or-other. I would suspect you could even continue to use the existing ec2.LaunchTemplate as the "stamp out copies of these" system, since there doesn't appear to be anything especially ASG-y about them, just that is the only(?) consumer thus far

After having typed all that out, I recalled that Open Stack exists and thus they may get you ever further toward your goal since they are trying to be on-prem AWS: https://docs.openstack.org/auto-scaling-sig/latest/theory-of...

Yeah that's basically what asg does, you can see the createFleet requests in cloudtrail

Nextgrid2y ago· 2 in thread

I don't get why they're using EBS here to begin with. EBS trades off cost and performance for durability. It's slow because it's a network-attached volume that's most likely also replicated under the hood. You use this for data that you need high durability for.

It looks like their use-case fetches all the data it needs from the network (in the form of the GH Actions runner getting the job from GitHub, and then pulling down Docker containers, etc).

What they need is a minimal Linux install (Arch Linux would be good for this) in a squashfs/etc and the only thing in EBS should be an HTTP-aware boot loader like IPXE or a kernel+initrd capable of pulling down the squashfs from S3 and run it from memory. Local "scratchspace" storage for the build jobs can be provided by the ephemeral NVME drives which are also direct-attach and much faster than EBS.

jedberg2y ago

By using EBS they don't have to wait for disk to fill from network on second+ boot.

Nextgrid2y ago

Ah so they are keeping the machines around? Do they need to do that - does the GH runner actually persist anything worth keeping in between runs?

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...

uavoperator2y ago· 2 in thread

This is really only tangentially related to the article, but

>If AWS responds that there is no current capacity for m7a instances, the instance is updated to a backup type (like m7i) and started again

Any ideas why m7i would be chosen as the backup type rather than the other way around? m7a seems to be more expensive than m7i, so maybe there's some performance advantage or something else I'm missing that makes AMD CPU containing instances preferable to Intel ones?

tenplusfive2y ago

At least with other instance types (m5,m6,t3) it was the case that the AMD processors were cheaper. As it turns out, this does not seem to be a general rule.

It seems like performance wise the AMD processors are (in certain workloads) quite a bit faster than their Intel equivalent: https://www.phoronix.com/review/aws-m7a-ec2-benchmarks/2 (in later pages it seems to be a little bit more mixed)

crohr2y ago

m7i CPU is in the same ballpark figure than m7a (https://runs-on.com/benchmarks/aws-ec2-instances/). When you look at the interruption percentage for m7a I think m7i (not m7i-flex if you don't want burstable instances) is probably the better choice. But I suppose it depends on availability in their specific zones.

paulddraper2y ago· 2 in thread

> From a billing perspective, AWS does not charge for the EC2 instance itself when stopped, as there's no physical hardware being reserved; a stopped instance is just the configuration that will be used when the instance is started next. Note that you do pay for the root EBS volume though, as it's still consuming storage.

Shutdown standbys absolutely the way to do it.

Does AWS offer anything for this, because it's very tedious to set this up.

tekla2y ago

Warm pools

paulddraper2y ago

yep, that's it, thank you kind person

everfrustrated2y ago· 1 in thread

It's too bad that EBS doesn't natively support Copy-On-Write.

Snapshots are persisted into S3 (transparently to the user) but it means each new EBS volume spawned doesn't start at full IOPS allocation.

I presume this is due to EBS volumes being specific-AZ so to be able to launch an AMI-seeded EBS volume in any AZ it needs to go via S3 (multi-AZ)

Twirrim2y ago

EBS volumes are "expensive" compared to S3, due to the limitations of what you can do with live block volumes + replicas, vs S3. It takes more disk space to have an image be a provisioned volume ready to be used for copy-on-write, vs having it as something backed up in S3. So the incentives aren't there vs just trying to make the volume creation process as smooth and fast as possible.

I'd guess it's likely that EBS is using a tiered caching system, where they'll keep live volumes around for Copy-on-write cloning for the more popular images/snapshots, with slightly less popular images maybe stored in an EBS cache of some form, before it goes all the way back to S3. You're just not likely to end up getting a live volume level of caching until you hit a certain threshold of launches.

waiwai9332y ago· 1 in thread

I believe this is similar to EC2 Fast Launch which is available for Windows AMIs, but I don't know exactly how that works under the hood.

mcbain2y ago

It does launch an instance and take a snapshot but what's happening is the sysprep and OOBE stuff that can take 10 mins or so (you can find it in the console and startup logs). That's a lot more overheard than just hydrating an EBS volume.

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...

cmckn2y ago· 1 in thread

You can enable fast restore on the EBS snapshot that backs your AMI: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-sn...

It’s not cheap, but it speeds things up.

1. https://r2.nathants.workers.dev/ec2_snitch.png

$540/month per EBS volume per AZ. And it’s still fairly limited, at a maximum of 8 credits, it wouldn’t nearly cover the use case described in the article (launching 50 instances quickly).

suryao2y ago· 1 in thread

This is very cool optimization.

I make a similar product offering fast Github actions runners[1] and we've been down this rabbit hole of boot time optimization.

Eventually, we realized that the best solution is to actually build scale. There are two factors in your favor then: 1) Spikes are less pronounced and the workloads are a lot more predictable. 2) The predictability means that you have a decent estimate of the workload to expect at any given time, within reason for maintaining an efficient warm pool.

This enables us to simplify the stack and not have high-maintenance optimizations while delivering great user experience.

We have some pretty heavy use customers that enable us to do this.

[1] https://www.warpbuild.com

matt-p2y ago

This is almost always the answer, adding an instance should be a fairly rare event.

crohr2y ago

> while we can boot the Actions runner within 5 seconds of a job starting, it can take GitHub 10+ seconds to actually deliver that job to the runner

This. I went the same route with regards to boot time optimisations for [1] (cleaning up the AMI, cloud-init, etc.), and can boot a VM from cold in 15s (I can't rely on prewarming pools of machines -- even stopped -- since RunsOn doesn't share machines with multiple clients and this would not make sense economically).

But the time taken by the official runner binary to load and then get assigned a job by GitHub always takes around 8s, which is more than half of the VM boot time :( At some point it would be great if GitHub could give us a leaner runner binary with less legacy stuff, and tailored for ephemeral runners (that, or reverse-engineer the protocol).

[1] https://runs-on.com

solatic2y ago

Makes me wonder why Depot isn't moving to on-prem hardware. When you're reselling compute with a better API, you give up a substantial proportion of your profits to the hyperscaler while offering worse performance (due to being held hostage to the hyperscaler's design decisions, like lazy loading root EBS from S3).

Surely an optimized approach here looks something like booting customer CI workloads directly from the hypervisor, using an ISO/squashfs/etc. stored directly on the hypervisor, where the only networked disks are the ones with the customers' BuildKit caches?

bingemaker2y ago

Curious, how do you measure the time taken for those 4 steps listed in "What takes so long?" section?

orf2y ago

It seems that you want to make your root volume as small as possible, and use it to only attach a pre-warmed pool of EBS volumes at launch time that contain the actual config/data you need?

You can launch a stripped down distribution with what, a 200mb disk? Then attach the “useful” EBS volume, and “do stuff” with that - launch a container, or whatever.

nathants2y ago

in the us-west-2-lax-1a local zone, i just booted 100 r5.xlarge spot instances as fortnite like game servers[1]. 1 to be a central server, 99 to be fake players. the server broadcasts x100 write amplified data from every player to every player. the 101st serve is my local pc.

the server broadcasts at 200 MB/s[2]. the whole setup costs me $3-4 usd/hour and by far the slowest part of boot is my game compiling on the central server, whether i store ccache data in s3 or not. i've booted this every day for the last 6 months, to test the game.

if your system can't handle 30s vm boots, your system should improve.

2. https://r2.nathants.workers.dev/ec2_boot.mp4

broknbottle2y ago

minimizing the image definitely helps.

https://aws.amazon.com/blogs/apn/how-to-build-sparse-ebs-vol...

https://netflixtechblog.medium.com/datastore-flash-upgrades-...

https://github.com/Netflix-Skunkworks/s3-flash-bootloader

elchief2y ago

I've noticed that Amazon linux 2023 boots faster than Ubuntu too

mediumsmart2y ago

Very cool. How many seconds of the faster boot time fit into one regular second?

thefaux2y ago

Whenever I see the flaws in aws ux, I remember that they bill by the hour.

pid-12y ago

Can you have warm pools of spot instances?

gtirloni2y ago

> tl;dr — boot the instance once, shut the instance down, then boot it again when needed

their own tldr should be at the top not middle of the article :)

j / k navigate · click thread line to collapse

140 comments

105 comments · 27 top-level

develatio2y ago· 13 in thread

torginus2y ago

It also takes a few seconds on AWS. The guy is comparing setting up a whole new machine from an image, with network and all, to turning on a stopped EC2 instance.

The latter takes a few seconds, the former is presumably longer. This is the great relevation of this blog post.

dylan6042y ago

wait, restarting a stopped machine is faster than launching an AMI from scracth is a great revelation?

That's like saying waking your MacbookPro is faster than booting from powered off state. Of course it is, and that's precisely why the option exists.

3 more replies

andersa2y ago

This is an advantage. EBS is terrible! Literally orders of magnitude slower than modern SSDs.

At least if you're using the Ec2 optimized amis, Ec2 instances frequently boot fast enough that they'll be executing your username initialization before you get the run instances response.

Though there's a long tail, so sometimes there can be a gap on the order of a couple second between the sync response and when the bootloader transfers over to the kernel.

tekla2y ago

attentive2y ago

It depends on instance type and OS and can be real short on ec2.

matt-p2y ago

What's size got to do with boot time? Serious question.

develatio2y ago

By "the size" I meant to say "the size of the infrastructure", meaning that AWS has to manage orders of magnitude more instances than Hetzner. This might as well contribute to "things" being slower.

RationPhantoms2y ago

More employed eyes on an issue or ability to compensate the best-in-class engineers to take a look.

ldoughty2y ago

playingalong2y ago

Likely they mean that following Conway's law in AWS there are more abstraction layers involved.

CaptainOfCoit2y ago

Smaller companies are faster and more nimble than larger corporations.

https://docs.aws.amazon.com/codebuild/latest/userguide/actio...

necovek2y ago· 11 in thread

I wonder if Amazon would ever decide to offer booting the same image with the same hypervisor in EC2 as they do for lambdas?

20thr2y ago

This is why CodeSandbox, Namespace, and even fly.io built special-purpose architectures to guarantee extremely start-up time.

In the case of Namespace it's ~2sec on cold boots with a set of user-supplied containers, with storage allocations.

(Disclaimer, I'm with Namespace -- https://namespace.so)

arianvanp2y ago

And AWS now has a product to spin up Lambdas for GitHub Actions CI runners

fhuici2y ago

cr125rider2y ago

Fargate is an alternative that runs on Firecracker as well. It's hidden behind ECS and EKS, however.

mochomocha2y ago

[1]: https://justingarrison.com/blog/2024-02-08-fargate-is-not-fi...

plopz2y ago

Fargate is too slow without the container cache you can get with ec2

scarface_742y ago

And CodeBuild…

solarengineer2y ago

necovek2y ago

Wow, that's amazing!

I wonder if I should try out Illumos as I am rebuilding my home server, but I am afraid that due to lack of time, it'd take me ages to replicate my libvirt-based setup with around 30 services.

How well are Linux containers and VMs supported in Illumos? How about nested KVM? That could help with the transition as I am heavily leaning into GNU tools and KVM.

oblio2y ago

> It is unfortunate that Ilumos is not popular enough for a multitude of reasons.

It's complicated but you can blame Sun for a lot of that, the primary thing was their explicit choice of license to make it incompatible (or hard-to-make-compatible with Linux).

solarengineer2y ago

I should add - this is a full Zone with a number of services. One of my half-finished projects was to trace the call graph of the various services started and disable those not needed.

jedberg2y ago· 10 in thread

This feels like a reasonable tradeoff if you can justify the cost with better auto-scaling.

And if you're not autoscaling, it's still worth the cost if the trade off is having your engineers wait around for instance boots.

fhuici2y ago

deivid2y ago

Unikraft is really cool, but Linux is not necessarily the blocker. You can boot to PID1 in firecracker in ~6ms, see my experiments: https://blog.davidv.dev/minimizing-linux-boot-times.html

pradn2y ago

Are customers that are so cost sensitive served well by public clouds and attendant infrastructure?

jedberg2y ago

Netflix scales on short intervals for reliability. Scale up quickly to handle a spike in traffic, scale down slowly to save money.

They also do proactive scaling to address the issues you brought up, since they can predict with fairly high accuracy the normal viewing patterns.

For example scaling up on Saturday morning ahead of all the kids waking up.

sfilmeyer2y ago

>By definition your predictions will be more accurate the smaller the window is.

jedberg2y ago

I can come up with a lot of examples where it is easier to predict further out[0], but that also means I can predict them 20 seconds out. :)

bushbaba2y ago

This dang age everyone should use autoscaling. relying on ODCR (capacity reservations) for guaranteeing resources exist.

cricketlover2y ago

wongarsu2y ago

cyberpunk2y ago

There’s no one answer for it, you need to learn your traffic / resource usage patterns and tune the scaling to match your situation.

No shortcuts really, although a lot of web applications behave “kinda” similar.

Start conservatively and tweak from there.

immibis2y ago· 9 in thread

Do we actually need all this stuff, or does it suffice to get one really powerful server (price less than $40k) and run Docker on it?

iudqnolq2y ago

That doesn't solve the same problem.

GitHub actions in the standard setup needs to run untrusted code and so you essentially need a VM.

You can lock it down at the cost of sacrificing features and usability, but that's a tradeoff.

fhuici2y ago

mike_hearn2y ago

A really powerful server should not cost you anywhere near $40k unless you're renting bare metal in AWS or something like that.

Getting rid of the overhead is possible but hard, unless you're willing to sacrifice things people really want.

immibis2y ago

$40k to buy the server, not to rent per month.

If you know who's using your build server, you probably don't need isolation stronger than Docker, because they can to to jail for hacking it.

necovek2y ago

Do you have an example image and network config that would demonstrate that?

(I'd love to understand the performance limits of Docker containers, but never played with them deeply enough since they are usually in >1s space which is too slow for me to care)

fhuici2y ago

Underneath, we use specialized VMs (unikernels), a custom controller and load balancer, as well as a number of perf tweaks to achieve this. But it's (now) certainly possible.

cjk22y ago

I'm mostly just running the (Go) compiler on my laptop which is considerably faster than on docker and considerably cheaper than the server...

I mean an ass end M3 macbook has the same compile time as an i9-14900k. God knows what an equivalent Xeon/Epyc costs...

immibis2y ago

benwaffle2y ago

reminds me of https://world.hey.com/dhh/we-re-moving-continuous-integratio...

amluto2y ago· 8 in thread

I don’t use EC2 enough to have played with this, but a big part here is the population of the AMI into the per-instance EBS volume.

ISTM one could do much better with an immutable/atomic setup: set up an immutable read-only EBS volume, and have each instance share that volume and have a per-instance volume that starts out blank.

Szpadel2y ago

we used s3fs in production. please don't use it, it's unreliable, unpredictable failure modes, can bring whole instance down. if you really need something like that use rclone mount

mdaniel2y ago

[0] https://news.ycombinator.com/item?id=39363499

Isn’t EBS multi-attach only available for the (very expensive) io1 / io2 volume types?

amluto2y ago

Hmm, it does look like it, although one could carefully use large IO.

attentive2y ago

That's reinventing ebs/ami/snapshots. They are already doing it i.e. data goes lazily from s3 to ebs/ec2.

amluto2y ago

antihero2y ago

Could you make a snapshot of the booted instance then and boot other instances from that?

amluto2y ago

Fundamentally, EBS volumes are designed to work more or less like actual disks, whereas modern scaled workloads want something closer to net-booted diskless machines.

maccard2y ago· 8 in thread

I don't use GHA as some of our code is stored in Perforce, but we've faced the same challenges with EC2 instance startup times on our self managed runners on a different provider.

We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"

CaptainOfCoit2y ago

> We would happily pay someone like depot for "here's the AMI I want to run & autoscale, can you please do it faster than AWS?"

maccard2y ago

Can you maybe share some examples? We're fine to use other image formats, but a lot of the value of AWS is that the services interact, IAM works nicely together, etc.

Fly.io comes up often [0] on HN, but there's an overwhelming amount of "it's a nice idea, but it just doesn't work" feedback on it.

abatilo2y ago

maccard2y ago

But, Thanks for the heads up.

Szpadel2y ago

maccard2y ago

Out of curiosity what CI system are you using with Perforce?

maccard2y ago

We use buildkite with a customised verison of https://github.com/improbable-eng/perforce-buildkite-plugin/

Our game code is in P4, but our backend services are on GH. Having a single CI system means we get easy interop e.g. game updates can trigger backend pipelines and vice versa.

In the past I've used TeamCity, Jenkins, and ElectricCommander(!)

fduran2y ago· 4 in thread

So I've created ~300k ec2 instances with SadServers and my experience was that starting an ec2 VM from stopped took ~30 seconds and creating one from AMI took ~50 seconds.

Pretty baffled by this (same region, same pretty much everything), any idea why?. Definitively going to try this trick in the article.

paranoidrobot2y ago

My guess is probably related to AWS Spot capacity.

The second and third spikes at 80 and 140 seconds lines up nicely with this kind of behavior.

The second spike would be optimised workloads that can respond to spot interruption in under 60 seconds.

The third spike would be Spot workloads that are being force-terminated.

The reason it's falling on those bounds is because of whatever is trying to schedule your workload only re-checks for free capacity once a minute.

I used to be able to spin up spot instances and basically never get interruptions. They'd stay on for weeks/months.

fduran2y ago

Thanks, pot capacity being scheduled differently would explain the behavior.

Almost all my ec2 instances are spot, and actually I can compare the distribution with the on-demand ones.

My spot instances are very short lived (15-30 mins max) and AFAIK I've never seen a spot instance force-terminated (this would be hard to find I think).

[0] https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-au...

fletchowns2y ago

Perhaps in one case you are getting a slice of a machine that is already running, versus AWS powering up a machine that was offline and getting a slice of that one?

fduran2y ago

Yes, some internal (AWS operation) explanation like the one you suggest makes sense.

albert_e2y ago· 3 in thread

AWS will (should) make this an optional feature.

Often the technology is the easier part.

The difficult part is how to name the feature intuitively, adding to an ocean of jargon and documentation, and making the configuration knobs intuitive both in UI and CLI/SDK.

Amazon Simple Compute Service :) ?

kylegalbraith2y ago

That said, I think this is a problem they could likely solve with that functionality, and we'd love to use it.

ldoughty2y ago

[0]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-sn...

immibis2y ago

AWS Lambda.

mnutt2y ago· 2 in thread

mdaniel2y ago

Yeah that's basically what asg does, you can see the createFleet requests in cloudtrail

Nextgrid2y ago· 2 in thread

It looks like their use-case fetches all the data it needs from the network (in the form of the GH Actions runner getting the job from GitHub, and then pulling down Docker containers, etc).

jedberg2y ago

By using EBS they don't have to wait for disk to fill from network on second+ boot.

Nextgrid2y ago

Ah so they are keeping the machines around? Do they need to do that - does the GH runner actually persist anything worth keeping in between runs?

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...

uavoperator2y ago· 2 in thread

This is really only tangentially related to the article, but

>If AWS responds that there is no current capacity for m7a instances, the instance is updated to a backup type (like m7i) and started again

tenplusfive2y ago

At least with other instance types (m5,m6,t3) it was the case that the AMD processors were cheaper. As it turns out, this does not seem to be a general rule.

crohr2y ago

paulddraper2y ago· 2 in thread

Shutdown standbys absolutely the way to do it.

Does AWS offer anything for this, because it's very tedious to set this up.

tekla2y ago

Warm pools

paulddraper2y ago

yep, that's it, thank you kind person

everfrustrated2y ago· 1 in thread

It's too bad that EBS doesn't natively support Copy-On-Write.

Snapshots are persisted into S3 (transparently to the user) but it means each new EBS volume spawned doesn't start at full IOPS allocation.

I presume this is due to EBS volumes being specific-AZ so to be able to launch an AMI-seeded EBS volume in any AZ it needs to go via S3 (multi-AZ)

Twirrim2y ago

waiwai9332y ago· 1 in thread

I believe this is similar to EC2 Fast Launch which is available for Windows AMIs, but I don't know exactly how that works under the hood.

mcbain2y ago

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/win-a...

cmckn2y ago· 1 in thread

You can enable fast restore on the EBS snapshot that backs your AMI: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-fast-sn...

It’s not cheap, but it speeds things up.