You can spend a lot of time getting databases and other stateful workloads to work -- mess around with StatefulSet and PVC on top of all the normal Kubernetes concepts, and what do you get in the end? Are you really better off than you would have been if you ran the database in EC2?
Plus, "herds not pets" kind of breaks down once you start using StatefulSets and PVCs. Those things exist to make Kubernetes more like a static environment for workloads that can't handle being run like ephemeral cattle. So why not just keep using your static environment?
If Kubernetes is the only workload management control plane you have, then I guess this makes sense. But if you are already able to deploy your databases with existing tools, and those existing tools don't really suck, it's probably not worth migrating. It would take a lot of time and introduce significant new risks and operational complexity without a compensating payoff.
Where I was the tooling was very focused on disposable api servers.
You need ha because k8s should run already with automatic node upgrades.
You need a pod disruption budget to make sure it is running and switching over when a node fails or gets upgraded.
You want to either totally Oberprovision on memory or look into keep 2400 to make sure to fine-tune memory before k8s starts to throw your database out constantly.
K8s is not a VM.
If you use k8s and still don't take care of application migration strategies you still don't understand what cloud native means.
There are still other things missing here but still...
Of course excluding hobby people playing with k8s.
Memory and upgrading nodes are the two single most issues will see which disrupts service.
Otherwise k8s is a dream come true.
I still would try to use a db managed if it's critical.
Additional points: Zalando postgres operator is great and shows the real magic of k8s and operator.
Use a helm chart and just bring your own little database for dev test and e2e tests.
You can easily use Auto scaling for node profiles. No noisy neighbors. If your db is too small for normal nodes you don't have a problem anyway.
This is difficult to impossible to do with databases; even if your database has a built-in recovery method for when a primary is taken offline, in such a way that allows for zero-downtime in theory, the reality is that such mechanisms depend on the secondary staying online until the failover mechanism is complete. If you turn over control of node upgrades to the cluster provider, the node under the secondary will get rebooted in the middle of the failover process, and you will get downtime at best, data loss at worst. What kubernetes teaches us is that databases aren't tied to the literal VM they're running on (which is now cattle), but rather on the availability of that node. If you run databases on kubernetes, you need to have a mechanism to slow down node upgrades.
Source: helped run hundreds of Elasticsearch and Kafka nodes on kubernetes in production at one point in my career
I've done it with Cassandra...and yeah Kafka can do it I've heard.
But those can be 30 hour operations even with you ducks in a row, and you better have backup strategies ready.
Fun story, Amazon said rds would be always be zero downtime upgrades. But then came a major version upgrade and .... Surprise it wasn't.
Cloudnative PG rebuilds the secondary during failiover from the streamed WAL to an S3 endpoint. No primary needed.
dev, test, and e2e tests should be done against full-size db clones
Real customer/sensitive data should not exist outside prod (and backup). So generally no, not full-size clones. I'd argue instrumentation in prod should give information on performance - for some tests/development you might need prod-size fake data.
that's cute, what is your "full-size"? I don't have 2 days to run a test, and I'm pretty sure every single compliance requirements we are following would get obliterated the second someone hears about us doing that
Does anyone have any specific recommendations on what to use (like which operator) when setting up a postgres cluster on k8s, specifically for standby replication?
We've been running Zalando Postgres Operator for all our prod and dev clusters (around 100 in total) and couldn't be happier.
- Can't set up two separate clusters in the same kubernetes instance because some cluster specific configuration is inexplicably set globally in the operator.
- Documentation and error messages are cryptic. Have to do a lot of trial and error to compensate for that. Maybe the issue here is a lack of experience with the stack used. Like Spilo and Patroni.
We had a couple unusual requirements that the operator wasn't really suited for, so we ultimately ended up writing our own helm chart and forgoing the operator route altogether
I've tried other Postgres operators and been disappointed and it did require a little learning, but it's not like getting replication, Patroni, etcd, PGBouncer, HAProxy, and pgBackRest all running for a high-availability Postgres deployment is easy and wouldn't require learning.
As the author says, "[k8s's] operator model allows end users to programmatically manage their workloads by writing code against the core k8s APIs to automatically perform tasks that would previously have to be done manually." To me, that's the benefit. The operator can handle tasks like adding a replica or failing over the primary to one of the replicas. I could presumably do some of that with other tools on bare metal/VMs (I can always shell-script things), but I've had a good experience with CloudNativePG's operator. Likewise, as the author says, making day-2 operations easier is a big thing.
K8s does have some annoying amount of complexity, but it's been nice overall.
I worked in this problem space extensively until 2020, and I think that there are paths forward but they require changes in K8S that none of the folks involved seem motivated to make. Realistically to make databases in K8S work well today you need a database built for K8S rather than one adapted for K8S.
The building blocks present today are not fundamentally capable of building a positive UX for adapting existing databases to K8S, but this is something that is worth making possible and I hope the community gets there some day.
Is https://kubernetes.io/docs/concepts/workloads/controllers/st... unsuitable for that?
This will be a problem for any database where clustering is synchronous and a specific primary node must start first on a full cluster restart. There are other out of band hacks you can do with reassigning PVCs, but it’s never elegant in the current primitives provided.
During my work in this problem space I became convinced that primitives for stateful applications in K8S were built specifically without considering databases as a valid use case. Everything else is just hacks after the fact to make it “work”.
Can you say what you see as those possible paths forward and what changes they would require?
I won't disagree with others that RDS is probably worth it until you need something very specific or have reached a certain scale.
Happy to share tips or pointers for anyone going down this path specifically with MySql or database workloads in general.
ps. And how you deal with migrations? ps. Forgive me if I'm asking for too much!
There is a main operator responsible for all the databases. It handles configuration changes, provisioning pods and slowly rolling out changes. In kube we model this with a custom resource we've defined called a KeyspaceShard which represents a named set of database instances that should participate in replication together. Once provisioned, the pods know how to hook up and detach from Vitess without requiring further involvement from the operator. Vitess handles backups and maintains the replication topology. "Complicated" is an apt description of what it does but not "complex". Evicting a database pod and letting the system reschedule and converge is a routine operation that doesn't cause much concern.
Migrations are done with gh-ost, which has its own custom operator that manages the lifecycle of the migration and ties into self service tooling we provide that is integrated with our build and deploy system.
The only thing I can think of is cost. My usage probably isn't high enough where there is any financial benefit to an alternative... but if it was, maybe I'd be considering this.
When you start to have dozens or hundreds of databases in production, and developers asking "I need Postgres in production, why can't I just click a button and get a Postgres instance for my service in production?" then scaling the monitoring and firewalling gets a little more complicated. Hooking into standard Kubernetes monitoring and service meshes can really help to simplify things.
That phrase is inaccurate. With the cloud and K8, the pets move from being software that is tightly tied to the hardware to being a collection of configurations and software that are tightly tied to themselves.
We just make the actual physical hardware anonymous. But from the perspective of the actual stack, there is still a server with its cpus, filesystem, i/o and everything.
"Pets that you can carry" is more like it.
In 2019, every operator had crazy bugs, we inherited all of them. You have to solve not just databases level error but also errors popping from operators. If you can avoid databases on kubernetes, you should just do it.
It seems to get worse the further down the stack you go. I’ve seen tons of problems with operators, monitoring tools, and CNIs.
It’s somewhat better now but there is still a lot of stuff you can’t depend on. The CNCF seems to endorse pretty much anything even if it’s crap.
K8s, if anything, is an API. An API that allows you to interact with compute, storage and networks in a way that is abstracted from the actual underlying infrastructure. This is incredibly powerful. You can, essentially, code and automate all your infrastructure.
But this goes beyond deployment, something you could achieve (more or less) with tools like Terraform or Pulumi. Enter "Day 2 operations".
Day 2 operations are essential for any database. And cloud services have done a good job at automating them. Speaking of Postgres, my daily job, things like HA, backups but also minor and major version upgrades are table stakes day 2 operations.
If you want to build these day 2 operations in the cloud (say on VMs), even though you have APIs do to so, a) they don't implement a pattern like Kubernete's reconciliation cycle; and b) you have a distinct API per cloud. K8s solves both problems, making it way "cheaper" to build such an automation. On K8s, a given operator can code these day 2 operations against K8s APIs. Therefore, if you want to build such automation, either you are a cloud provider (and potentially do this only for your own cloud) or you do it on Kubernetes.
This is so much true, that existing operators have already gone beyond what DBaaS do. Speaking of StackGres [0] (disclaimer: founder), we have implemented day 2 operations (other than the "table stakes" ones that I mentioned before) that no other DBaaS offers as of today, such as vacuums, repacks and even benchmarks (and more day 2 operations will be developed). See [1] for the CRD specs of SGDbOps, our "Day 2 operations" if you are interested.
[0] https://stackgres.io [1] https://stackgres.io/doc/latest/reference/crd/sgdbops/
I have seen it fail way too many times. Inspecting a failing deployment that now has some magic Go code someone wrote running on this cluster. I can see using the basic kube building blocks: deployments, pods, config maps, etc.; there are enough guides and tools to help you out. As soon as you start writing code that runs in there, you're now dealing with two problems: your actual thing you're deploying, and now the operator.
Well, and then you need a mesh, and a way to manage certificates. and if it's a database to manage all the volumes. Everything looks good at the architect level - all the boxes and arrows line up, but when it breaks in production it's a nightmare to debug.
After switching to the Chrunchy Data pg operator v5 on k8s, we've had close to zero problems - one or two times a year the log shipping / HA replication fails and we have to restart it, but it's really neat! I can *warmly* recommend it; it really is CloudSQL in K8S.