Also, it brings nothing of value to the table, but requires a lot of dance around it to keep it going. I.e. if you are a decent DBA, you don't have a problem setting up a node to run your database of choice, you would be probably opposed to using pre-packaged Docker images anyways.
Also, Kubernetes sucks at managing storage... basically, it doesn't offer anything that'd be useful to a DBA. Things that might be useful come as CSI... and, obviously, it's better / easier to not use a CSI, but to interface directly with the storage you want instead.
That's not to say that storage products don't offer these CSI... so, a legitimate question would be why would anyone do that? -- and the answer is -- not because it's useful, but because a lot of people think they need / want it. Instead of fighting stupidity, why not make an extra buck?
If I run a db workload in K8s, it’s a tiny fraction of the operational overhead, and not a massively noticeable performance loss.
I would absolutely love a way to deploy and manage db’s as easily as K8s with fewer of the quite significant issues that have mentioned, so if you know of something that is better behaved around singular workloads, but keeps the simple deploys, the resiliency, the ease of networking and config deployments, the ease of monitoring, etc, I am all ears.
1. fsync. You cannot "divide" it between containers. Whoever does it, stalls I/O for everyone else.
2. Context switches. Unless you do a lot of configurations outside of container runtime, you cannot ensure exclusive access to the number of CPU cores you need.
3. Networking has the same problem. You would either have to dedicate a whole NIC or SRI-OV-style virtual NIC to your database server. Otherwise just the amount of chatter that goes on through the control plane of something like Kubernetes will be a noticeable disadvantage. Again, containers don't help here, they only get in the way as to get that kind of exclusive network access you need more configuration on the host, and, possible an CNI to deal with it.
4. kubelet is not optimized to get out of your way. It needs a lot of resources and may spike, hindering or outright stalling database process.
5. Kubernetes sucks at managing memory-intensive processes. It doesn't work (well or at all) with swap (which, again, cannot be properly divided between containers). It doesn't integrate well with OOM killer (it cannot replace it, so any configurations you make inside Kubernetes are kind of irrelevant, because system's OOM killer will do how it pleases, ignoring Kubernetes).
---
Bottom line... Kubernetes is lame from infrastructure perspective. It's written for Web developers. To make things appear simpler for them, while sacrificing a lot of resources and hiding a lot of actual complexity... which is impossible to hide, and which, in an even of failure will come to bite you. You don't want that kind of program near your database.
- lots of containers running on a single host
- containers are each isolated in a VM (aka virtualized)
- workloads are not homogenous and change often (your neighbor today may not be your neighbor tomorrow)
I believe these are fair assumptions if you’re running on generic infrastructure with kubernetes.
In this setup, my concerns are pretty much noisy neighbors + throttling. You may get latency spikes out of nowhere and the cause could be any of:
- your neighbor is hogging IO (disk or network)
- your database spawned too many threads and got throttled by CFS
- CFS scheduled your DBs threads on a different CPU and you lost your cache lines
In short, the DB does not have stable, predictable performance, which are exactly the characteristics you want it to have. If you ran the DB on a dedicated host you avoid this whole suite of issues.
You can alleviate most of this if you make sure the DB’s container gets the entire host’s resources and doesn’t have neighbors.