For the Love of God, Stop Using CPU Limits on Kubernetes (opens in new tab)

(home.robusta.dev)

22 pointsarguflow1y ago23 comments

23 comments

12 comments · 5 top-level

gladiatr721y ago· 5 in thread

Almost 2-yo post and still as wrong as it was when it was posted.

kstrauser1y ago

What’s wrong about it? I haven’t managed k8s much and don’t have enough experience to evaluate their claim.

potoftea1y ago

I can't comment about about other statement.

But there are serious drawbacks if you don't set requests/limits for mission-critical process, is that they can be killed by kernel to free some resources (if the node reach max resource usage)

When you don't set cpu/memory limit your pod QoS class is burstable, which better then BestEffort, but still get assigned `oom_score_adj` score. IMHO you almost 99% you want `Guaranteed` for critical process.

1. oom_score_adj - https://kubernetes.io/docs/concepts/scheduling-eviction/node...

2. Guaranteed - https://kubernetes.io/docs/tasks/configure-pod-container/qua...

3. QoS - https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/

1 more reply

candiddevmike1y ago

K8s works best (economically) when it can bin pack things. The only way it can bin pack things safely is by having user provided limits--its not smart enough to right size your app (out of the box at least). Not setting them means you're going to end up paying more or have resource contention/outages.

It's better to set the limits higher than you need than to not set them at all. Ideally this is easily done since you're profiling/load testing your app and you understand the appropriate sizing for it, right?

3 more replies

crest1y ago

Users come to expect the performance that was never promised or even properly requested. Once you efficiently load the system they complain because you overdelivered and they got used to it.

2 more replies

jrockway1y ago

Yeah, it's a pretty weird take. I would agree if they said "a lot of people don't know what CPU limits are and set them without thinking", but "you should never use them" doesn't make any sense. For example, every computer anyone has ever used has a CPU limit. There isn't infinite CPU inside your computer. That's a CPU limit.

The mistake that surprises people is "I'm going to tell my app it can use 128 CPUs but I'm setting the CPU limit to 1." OK, well, then your app is going to be asleep 127/128th of the time. It's a surprise because the app reads /proc/cpuinfo to guess how many CPUs you have, but ... that is not the correct algorithm.

Another thing is imagining that usage spikes are going to occur randomly throughout time. If app A is under heavy use, it can steal app B's CPU shares, because who would use app A and app B at the same time? Most of the time they're both idle, so it's a waste to reserve 1 CPU for app A and 1 CPU for app B, and have app A throttled while app B is idle. But you'll probably find that everything you host is popular from 9am to 5pm local time, and for 16 hours a day you are using 0% CPU and for 8 hours a day you are using 200% CPU. The idea is to guarantee some quality of service for both apps, even at busy times. The goal is not to maximize overall throughput.

You can tune your latency vs. throughput goals if all the apps are yours, but as soon as you have different teams, I doubt team B is going to say "sure we can get paged for high request latency as long as Team A is getting as much of the CPU as they can". That's what CPU limits are for, consistency when things get tough. Not for overall utilization.

danjrslp1y ago· 1 in thread

I used to be on the same page as the author. But then I saw tons of application teams not setting CPU limits, and coming to rely on the bursting (in other words, their requests were too low). Thus when the system came under load their application started slowing in unexpected ways.

We've had success with CPU limits, and horizontal scaling.

rblatz1y ago

He does address that at the end of the article with the dashboard that looks at historical usage to inform your cpu requests.

sadops1y ago· 1 in thread

Can't your operating system manage your CPU resources for you already? Why does Kubernetes need to be involved in process scheduling?

rblatz1y ago

Kubernetes is giving my operating system the cpu to manage.

jauntywundrkind1y ago

It's sad to me that Kubernetes doesn't expose the excellent hierarchical system for rationing CPU that's built into the kernel: cgroups. It has its own separate constraint system, makes its own scheduler. And it just seems not as good, not as flexible, as the hierarchical system cgroups offers.

Being able to make a cgroup where essential services as a whole share a pool guaranteed 30%, then further refining & trading off that pool & other work pools feels like such a superpower. Compared to having to manage all services in flat, absolute terms.

rcarmo1y ago

I’d say being able to set I/O limits can be much more useful than CPU limits regardless of platform. Less chance of bringing an entire host to a halt.

j / k navigate · click thread line to collapse

23 comments

12 comments · 5 top-level

gladiatr721y ago· 5 in thread

Almost 2-yo post and still as wrong as it was when it was posted.

kstrauser1y ago

What’s wrong about it? I haven’t managed k8s much and don’t have enough experience to evaluate their claim.

potoftea1y ago

I can't comment about about other statement.

But there are serious drawbacks if you don't set requests/limits for mission-critical process, is that they can be killed by kernel to free some resources (if the node reach max resource usage)

1. oom_score_adj - https://kubernetes.io/docs/concepts/scheduling-eviction/node...

2. Guaranteed - https://kubernetes.io/docs/tasks/configure-pod-container/qua...

3. QoS - https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/

1 more reply

candiddevmike1y ago

3 more replies

crest1y ago

Users come to expect the performance that was never promised or even properly requested. Once you efficiently load the system they complain because you overdelivered and they got used to it.

2 more replies

jrockway1y ago

danjrslp1y ago· 1 in thread

We've had success with CPU limits, and horizontal scaling.

rblatz1y ago

He does address that at the end of the article with the dashboard that looks at historical usage to inform your cpu requests.

sadops1y ago· 1 in thread

Can't your operating system manage your CPU resources for you already? Why does Kubernetes need to be involved in process scheduling?

rblatz1y ago

Kubernetes is giving my operating system the cpu to manage.

jauntywundrkind1y ago

rcarmo1y ago

I’d say being able to set I/O limits can be much more useful than CPU limits regardless of platform. Less chance of bringing an entire host to a halt.

j / k navigate · click thread line to collapse