undefined | Better HN

0 pointsorbz2y ago0 comments

A disturbingly large number of deployments I’ve seen using Kubernetes or docker compose have databases deployed as such.

0 comments

spockz2y ago

Given the ability to deploy pods to dedicated nodes based on label selectors, what is the actual performance impact of running a database in a container on a bare metal host with mounted volume versus running that same process with say systemd on that same node? Basically, shouldn’t the overhead of running a container be minimal?

crabbone2y ago

The problem is kubelet likes to spike in memory / CPU / network usage. It's not a well-behaved program to put alongside a database. It's not written with an eye for resource utilization.

Also, it brings nothing of value to the table, but requires a lot of dance around it to keep it going. I.e. if you are a decent DBA, you don't have a problem setting up a node to run your database of choice, you would be probably opposed to using pre-packaged Docker images anyways.

Also, Kubernetes sucks at managing storage... basically, it doesn't offer anything that'd be useful to a DBA. Things that might be useful come as CSI... and, obviously, it's better / easier to not use a CSI, but to interface directly with the storage you want instead.

That's not to say that storage products don't offer these CSI... so, a legitimate question would be why would anyone do that? -- and the answer is -- not because it's useful, but because a lot of people think they need / want it. Instead of fighting stupidity, why not make an extra buck?

FridgeSeal2y ago

I run DB’s on K8s, not because I don’t know what I’m doing, but because most of the trade offs are worth it.

If I run a db workload in K8s, it’s a tiny fraction of the operational overhead, and not a massively noticeable performance loss.

I would absolutely love a way to deploy and manage db’s as easily as K8s with fewer of the quite significant issues that have mentioned, so if you know of something that is better behaved around singular workloads, but keeps the simple deploys, the resiliency, the ease of networking and config deployments, the ease of monitoring, etc, I am all ears.

2 more replies

lokar2y ago

If you care about perf you would pin the kubelet and all other overhead workload to one core, and mask that off for your workload.

1 more reply

danappelxx2y ago

IMO if you’re concerned about performance and yet are deploying databases this way — mmap should not even be on the radar.

charcircuit2y ago

How would containers even hurt performance? How does the database no longer having the ability to see other processes on the machine somehow make it slower?

crabbone2y ago

There are many "holes" in these containers.

1. fsync. You cannot "divide" it between containers. Whoever does it, stalls I/O for everyone else.

2. Context switches. Unless you do a lot of configurations outside of container runtime, you cannot ensure exclusive access to the number of CPU cores you need.

3. Networking has the same problem. You would either have to dedicate a whole NIC or SRI-OV-style virtual NIC to your database server. Otherwise just the amount of chatter that goes on through the control plane of something like Kubernetes will be a noticeable disadvantage. Again, containers don't help here, they only get in the way as to get that kind of exclusive network access you need more configuration on the host, and, possible an CNI to deal with it.

4. kubelet is not optimized to get out of your way. It needs a lot of resources and may spike, hindering or outright stalling database process.

5. Kubernetes sucks at managing memory-intensive processes. It doesn't work (well or at all) with swap (which, again, cannot be properly divided between containers). It doesn't integrate well with OOM killer (it cannot replace it, so any configurations you make inside Kubernetes are kind of irrelevant, because system's OOM killer will do how it pleases, ignoring Kubernetes).

---

Bottom line... Kubernetes is lame from infrastructure perspective. It's written for Web developers. To make things appear simpler for them, while sacrificing a lot of resources and hiding a lot of actual complexity... which is impossible to hide, and which, in an even of failure will come to bite you. You don't want that kind of program near your database.

3 more replies

danappelxx2y ago

I’ll assume the worst case:

- lots of containers running on a single host

- containers are each isolated in a VM (aka virtualized)

- workloads are not homogenous and change often (your neighbor today may not be your neighbor tomorrow)

I believe these are fair assumptions if you’re running on generic infrastructure with kubernetes.

In this setup, my concerns are pretty much noisy neighbors + throttling. You may get latency spikes out of nowhere and the cause could be any of:

- your neighbor is hogging IO (disk or network)

- your database spawned too many threads and got throttled by CFS

- CFS scheduled your DBs threads on a different CPU and you lost your cache lines

In short, the DB does not have stable, predictable performance, which are exactly the characteristics you want it to have. If you ran the DB on a dedicated host you avoid this whole suite of issues.

You can alleviate most of this if you make sure the DB’s container gets the entire host’s resources and doesn’t have neighbors.

2 more replies

j / k navigate · click thread line to collapse

0 comments

spockz2y ago

crabbone2y ago

The problem is kubelet likes to spike in memory / CPU / network usage. It's not a well-behaved program to put alongside a database. It's not written with an eye for resource utilization.

FridgeSeal2y ago

I run DB’s on K8s, not because I don’t know what I’m doing, but because most of the trade offs are worth it.

If I run a db workload in K8s, it’s a tiny fraction of the operational overhead, and not a massively noticeable performance loss.

2 more replies

lokar2y ago

If you care about perf you would pin the kubelet and all other overhead workload to one core, and mask that off for your workload.

1 more reply

danappelxx2y ago

IMO if you’re concerned about performance and yet are deploying databases this way — mmap should not even be on the radar.

charcircuit2y ago

How would containers even hurt performance? How does the database no longer having the ability to see other processes on the machine somehow make it slower?

crabbone2y ago

There are many "holes" in these containers.

1. fsync. You cannot "divide" it between containers. Whoever does it, stalls I/O for everyone else.

2. Context switches. Unless you do a lot of configurations outside of container runtime, you cannot ensure exclusive access to the number of CPU cores you need.

4. kubelet is not optimized to get out of your way. It needs a lot of resources and may spike, hindering or outright stalling database process.

---

3 more replies

danappelxx2y ago

I’ll assume the worst case:

- lots of containers running on a single host

- containers are each isolated in a VM (aka virtualized)

- workloads are not homogenous and change often (your neighbor today may not be your neighbor tomorrow)

I believe these are fair assumptions if you’re running on generic infrastructure with kubernetes.

In this setup, my concerns are pretty much noisy neighbors + throttling. You may get latency spikes out of nowhere and the cause could be any of:

- your neighbor is hogging IO (disk or network)

- your database spawned too many threads and got throttled by CFS

- CFS scheduled your DBs threads on a different CPU and you lost your cache lines

In short, the DB does not have stable, predictable performance, which are exactly the characteristics you want it to have. If you ran the DB on a dedicated host you avoid this whole suite of issues.

You can alleviate most of this if you make sure the DB’s container gets the entire host’s resources and doesn’t have neighbors.

2 more replies

j / k navigate · click thread line to collapse