If you have the hardware resources, why not just scale up from the beginning on? If you do not have the resources, you need a lot of money anyways to pay the upscaled rent afterwards.
I never auto-scale interactive workloads, but it's good for batch work.
Other people have different feelings. Consider the case where you release software multiple times a day, but it has a memory leak. You don't notice this memory leak because you're restarting the application so often. But the Winter Code Freeze shows up, and your app starts running out of memory and dying, paging you every day during your time off. If you had horizontal autoscaling, you would just increase the amount of memory that your application has until you come back and fix it. Sloppy? Sure. But maybe easier to buy some RAM for a couple weeks and not disrupt people's vacation. (The purist would argue their vacation was ruined the day they checked in the memory leak.) This gets all the more fun when the team writing the code and the team responsible for the error rate in production are different teams in different time zones. I don't think that's a healthy way to structure your teams, but literally everyone else on earth disagrees with me, so... that's why there's a product that you can sell to the infrastructure team instead of telling the dev team "wake up and call free() on memory you're not using anymore".
What happened is we'd have a queue processor that normally needed a couple of pods to handle events. Except that once a day another process would drop in 5 million requests into the queue.
So I just had a simple keda autoscaler based on the length of the queue. One pod for every 10,000 items in the queue with a minimum of 2 pods and a maximum of 50 pods.
It would scale up after the big queue dumps, chew threw the backlog and then scale back down again.
That said, you could conceivably live at a higher abstraction.
Take dev environments for example. Ideally the team working on infra problems does not need to care how many versions of a backend are operating on the dev environment.
The only thing infra needs to take into account the requested resources.
Perverse incentives on wasting resources aside, it's nice when you can have fewer variables in your mind when focusing on your responsibility areas, it allows deeper intuition and creativity - at the sacrifice of some cross cutting creativity across teams.
> The only thing infra needs to take into account the requested resources.
> Perverse incentives on wasting resources aside
(I do infra.) That's like, 95% of the problem. AFAICT, most devs have absolutely no idea how powerful a computer is.
My last change was to resize a 400 core, 800 GiB set of compute into a 100 core, 150 GiB set. It was just ludicrously over-provisioned, because the dev teams isn't incentivized to care at all. (…sadly, I'm not allowed to go out an hire a dev now, even though I literally just saved that amount of money in cloud costs…)
(It's still over-provisioned, but that was the easy "we can lop this compute off and I promise you you won't notice.)
The economics/incentives at play are the hard part. Getting management to not look at infra for "ehrmagerd the cloud bill" but instead devote dev time into getting them to dig into "why is this app, which ostensibly just shuffles JSON about the landscape, using 8 cores and all the RAM?" is … tough. And not what I signed up for in SWE, damn it.
The other way is equally bad: devs find they've run out of resources? Knee-jerk is "resize compute upwards" not some introspection of "wait, what is a reasonable amount of CPU use for JSON shuffling?"
Usage graphing is the other tool that really puts some devs' work in a rather bad light: resource requests of like 20 CPU, but the usage graph says "0.02 CPU". So … at least the code's not inefficient, but the requested resources are wasted.
Folks in other comments have answered this pretty well. Over the past couple of years, I've talked to many companies and individuals who have greatly benefited from autoscaling on k8s. Generally, it has helped in these areas:
1. Obvious case: if you run your environment on cloud providers, it can significantly save costs and improve throughput.
2. It's not just about autoscaling workloads, but also about managing batch jobs (K8s Jobs) that are triggered by events or custom metrics on demand (you can think of this as a CronJob on steroids).
3. On-prem solutions: You're right; you can use the resources you've already paid for. However, by enabling autoscaling, you can also improve the distribution and utilization of those resources. In large organizations, it is common practice for individual teams to be treated as "internal customers" with assigned quotas they can use. Autoscaling can be helpful in these scenarios as well.
If you are interested in the area, I've given several talks on K8s autoscaling, for example, our latest talk from KubeCon: https://sched.co/1YhgO
Also, when a user triggers new previews we scale up nodes to process that data. The problem there though is the scale up time of the node pool which is a few minutes for a GPU node on Azure.
We payed to have a GPU running all the time before but that got too expensive.
As a side note, would I do it again I probably wouldn’t build a data pipeline on top of KEDA ScaledJobs and possibly not use Kubernetes at all.
For most workloads it's wasteful to have max capacity provisioned at all times if you can instead provision on-demand.
This is true in general. For example, electricity supply is a mix of baseload power (cheap but only if left running constantly) and peaking (expensive but easy to turn on and off). It wouldn't be economical to have baseload capacity equal to maximum demand. Instead it is aimed at minimum demand and other sources make up the difference depending on demand.
Maybe I misunderstood your question but is there a case where you can keep your entire capacity running for free? I'd assume you pay AWS/other cloud or your electricity provider.
Colo providers charge by the [rack with given network port size and power delivery], so unless you literally host on premises which almost nobody does even when they talk about on prem, once you get outside of a cloud environment it is rare for it to pay to shut down servers unless they'll be down for a long time. Maybe there'd be a business there for colo providers to offer pricing that incentivises powering down machines (almost all modern servers have IPMI, and so as long as you provide the trickle - relatively speaking - of power for the IPMI board you an power the servers down/up over the network on demand), but it's not the norm.
The problem with these traffic spikes you mention is that "everybody" has them, and the overlap is significant, and so they're priced in because the cloud providers needs capacity to handle the worst overlap in spikes, plus margin. 20%-30% drop is way too low to cover the cost gap between even managed servers with a huge capacity margin and most cloud providers. I've worked for a lot of different companies where we've forecast our capacity requirements, and the graphs look almost identical. Sometimes shifted n hours to account for different in timezones, but for a lot of companies the graph is near identical globally because of similar distributions of userbase.
(If you do think you can do scaling up/down for daily spikes cost effectively, you can typically do it even more cost effectively by putting your base load in a colo'ed environment, and scale into a cloud environment if you hit large enough spikes; the irony is that in environments where I've done that, we've ended up cutting the cost of the colo'ed environment by cutting closer to the margin and end up almost never end up scaling into the cloud, but it gives peace of mind, and so being prepared to use cloud has made actually using cloud services even less cost attractive).
In practice, most places - there are exceptions that makes good use of it - just set up autoscaling so they don't need to pay attention to creeping resource use. Which is rarely a good use of it.
There are good uses for autoscaling, but it's very rare for day/night or weekend/weekday cycles to be significant enough that it isn't still cheaper to buy enough capacity to take all or most of the spikes (but having the ability to scale into a cloud service might mean you only buy just enough for the "usual" weekday cycles, or even shave a little bit of the top, instead of buying enough for unexpected surges on top).
¹I think the thing here is that for most of the jobs I've worked … we're really not doing "big" things. The complexity is all business logic or how the product does what it does, not scale.
(We do provision some CI compute on-demand, so that scales with load, so it's not all fixed.)
I think the last "ooh, fun compute!" thing I did was like almost 10 years ago now where we had a huge job that needed to run. But it was sort of the opposite of the stuff in this thread: since it wasn't on-demand, we could run it whenever. That ended up being at night on spot-priced VMs, when they were cheap.
We could also talk about optimizing the costs of development and staging and sales demo workloads that don’t need to run 24/7 or even 8/5 as well.
I've set up multiple hybrid setups over the years, and what I've consistently found is that we can provision 2x-3x (more if egress is high) the amount of server capacity for the same price with managed hosting providers or in colo'ed environments than with cloud providers. That's fully loaded cost including rates for contracts for devops etc..
Very few people need to auto-scale up more than that. But since most people still want orchestration, it tends to cost little to set up their system so that if they have a spike, they can scale up extra capacity in a cloud. And in doing so, they can cut the amount of hardware they provide for the base load to whatever is cheapest.
The first times I did this, I was fully convinced going in that this would mean we'd set the base load around the lowest utilization over a typical 24/7 cycle, and spin up some cloud instances during daily peaks etc.
In practice, after actually testing what pays for a given scenario, I've yet to see that (scaling up/down for a typical 24/7 cycle or 8/5) pay off, though I'm sure it can for some people. .
Managed servers proved in actual, real-life usage scenarios, to be sufficiently cheaper that unless your spikes were very brief and sharp[1], it was cheaper to provision enough base capacity to handle most or all of the normal daily spikes, and what a hybrid setup bought us was the freedom to not overprovision for "what-if" scenarios.
That effect was significant enough that even e.g. SaaS services used almost entirely by office staff within a single time zone often do not save on auto-scaling vs. non-cloud servers scaled for peak use, because the 8-10 hour window of use that creates is far too long - depending on your specific cloud cost and what your cheapest alternative is, it may vary, but I've rarely found it pays to spin up cloud resources for spikes that are on average any longer than 4-6 hours in a day, and that tends to rule out most "normal" cyclical use, especially as you can often adjust "cronjob heavy" parts of your workload to fall outside of the window, for example, to even out the load.
Auto-scaling absolutely pays at far smaller variations in load if your only option is to have all your load in a cloud environment, but even then I see a lot of people resort to auto-scaling before they've even though of cutting the cost of their base load by e.g. ensuring they use reserved instances where it makes sense etc., or negotiating. Often ticking those boxes will have a much larger impact.
By all means ensure your system is built so that it can handle auto-scaling gracefully, though - it will benefit you whether or not you end up making much use of it.
[1] As an example where we got "close", one company I worked had several clients that did large e-mail sends for restaurant chains that often included massive discounts. On the e-mails with highest open rates, you'd then very predictably get traffic spikes from 8:00-8:15, 9:00-9:15, and 10:00-10:15 that were massively, as people checked their e-mail when they got into the office, with the 9:00-9:15 peak being several times higher than their normal daily use. If they had been hosting this themselves, it'd have paid to auto-scale into a cloud env. to handle those spikes, especially as they didn't send such campaigns every day. In our case, most of our other customers had reasonably quiet mornings and so we can could overprovision VM's from our base capacity for them at no extra cost to us. But this was also a rare exception.
--
1. Have a baseline amount of resources and use keda to scale up using spot instances. 2. For video call bots scale up during the calls and then scale down once the call is completed. Some of our customers can scale up to hundreds of machines during the day then scale down to a couple during the end of the business day. 3. Scale up for cronjobs that require a lot of resources but where the web traffic can use significantly less resources.
This is pretty significant, since the 2 different apps are relatively large JVM apps, each requiring ~16GiB of memory
Ed-tech is a big one where you may have extremely low traffic on weekends/summer/holidays/breaks.
2. Hopefully, if you scale up, you can also scale down, which will save you money when you don't need to rent the resources.
~8-~24 Mo-fr 10-350vms
However, I wonder what you mean? Kubernetes from where I sit has almost complete ubiquity across most companies. Even in places where it's a poor fit.
Whether that means Kubernetes is dying, I'm not so sure. But Kubernetes is extremely complex for a lot of workloads that it's total overkill for, so I'm not surprised people are looking for options.
The only issue I have with the default HorizontalPodAutoscaler is that I cannot scale down to 0 when some processing queues are empty. Other than that, we have shrines erected to k8s.