Can you expand on this? Why should the application break? It still has its guaranteed cpu requests. If it breaks with the defined requests, shouldn't it always break?
In this case it was a GraphQL implementation that gradually grew in size, complexity and scope. The team maintaining it never adjusted the initial resource requirements within about 2 years because it has never been a problem - until a different (much larger by total CPU allocated) service started consuming all the bursting it could get and the GraphQL service just stopped processing requests. Before any autoscaling could kick in, it went into CrashLoopBackoff with failing liveness.
You can and should catch such cases early with monitoring, but our platform team was extremely tiny, especially for how many developers it served.
EDIT: the vertical pod autoscaler helps with this as it will adjust requirements on pods to make sure they don't over/under allocate.
The whole point is to scale out, not up.
KEDA [0] lets you scale on any number of things - basically if you scrape it with Prometheus, you can scale on it.
[0] https://keda.sh
you can layer high priority service and low priority service better if you use some buffer.
Would it be possible for you to explain "you can layer high priority service and low priority service better if you use some buffer" further?
you want those low priory cgroup to use the extra memory if it’s available (not used by a high priority cgroup)
it a high priority cgroup need the memory it steal it from the low priority cgroup up to the minimum guaranteed to this low priority cgroup.
this low priority cgroup memory contention will turns into IO pressure from page faults. Then IO limit on the low priority cgroup will cap how much IO it can generate by throttling it.
Meanwhile the high priority cgroup use it’s guaranteed cpu, guaranteed IO and guaranteed memory with no hiccup.
to learn more check « fbtax2 memory controller configuration » at https://facebookmicrosites.github.io/cgroup2/docs/memory-con...
In the context of pods dynamically spinning up and spinning down, it's bad when a pod replica can't be allocated in the cluster "predictably" but there is nothing worse than when a new pod (new deployment) fails because "Marcus the pod" drank all the water and now I have to call DevOps and wait god knows how long before they spin up a new node to guarantee a spot for the new pod.
Bin-packing is a already an np-hard problem. If you remove limits from CPU then you're adding probabilities into the mix. So, for the love of god, always use limits unless you have a very specific use case.
Cluster autoscaler. No need to call anyone.