But there are serious drawbacks if you don't set requests/limits for mission-critical process, is that they can be killed by kernel to free some resources (if the node reach max resource usage)
When you don't set cpu/memory limit your pod QoS class is burstable, which better then BestEffort, but still get assigned `oom_score_adj` score. IMHO you almost 99% you want `Guaranteed` for critical process.
1. oom_score_adj - https://kubernetes.io/docs/concepts/scheduling-eviction/node...
2. Guaranteed - https://kubernetes.io/docs/tasks/configure-pod-container/qua...
3. QoS - https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/
It's better to set the limits higher than you need than to not set them at all. Ideally this is easily done since you're profiling/load testing your app and you understand the appropriate sizing for it, right?
The mistake that surprises people is "I'm going to tell my app it can use 128 CPUs but I'm setting the CPU limit to 1." OK, well, then your app is going to be asleep 127/128th of the time. It's a surprise because the app reads /proc/cpuinfo to guess how many CPUs you have, but ... that is not the correct algorithm.
Another thing is imagining that usage spikes are going to occur randomly throughout time. If app A is under heavy use, it can steal app B's CPU shares, because who would use app A and app B at the same time? Most of the time they're both idle, so it's a waste to reserve 1 CPU for app A and 1 CPU for app B, and have app A throttled while app B is idle. But you'll probably find that everything you host is popular from 9am to 5pm local time, and for 16 hours a day you are using 0% CPU and for 8 hours a day you are using 200% CPU. The idea is to guarantee some quality of service for both apps, even at busy times. The goal is not to maximize overall throughput.
You can tune your latency vs. throughput goals if all the apps are yours, but as soon as you have different teams, I doubt team B is going to say "sure we can get paged for high request latency as long as Team A is getting as much of the CPU as they can". That's what CPU limits are for, consistency when things get tough. Not for overall utilization.
We've had success with CPU limits, and horizontal scaling.
Being able to make a cgroup where essential services as a whole share a pool guaranteed 30%, then further refining & trading off that pool & other work pools feels like such a superpower. Compared to having to manage all services in flat, absolute terms.