Running Databases on Kubernetes (opens in new tab)

(questdb.io)

145 pointssklarsa3y ago75 comments

75 comments

52 comments · 15 top-level

_skel3y ago· 10 in thread

I don't think the upsides are worth all the work.

You can spend a lot of time getting databases and other stateful workloads to work -- mess around with StatefulSet and PVC on top of all the normal Kubernetes concepts, and what do you get in the end? Are you really better off than you would have been if you ran the database in EC2?

Plus, "herds not pets" kind of breaks down once you start using StatefulSets and PVCs. Those things exist to make Kubernetes more like a static environment for workloads that can't handle being run like ephemeral cattle. So why not just keep using your static environment?

If Kubernetes is the only workload management control plane you have, then I guess this makes sense. But if you are already able to deploy your databases with existing tools, and those existing tools don't really suck, it's probably not worth migrating. It would take a lot of time and introduce significant new risks and operational complexity without a compensating payoff.

annexrichmond3y ago

yeah but if your org has orchestration tooling built around k8s, in a way it becomes much easier to provision a DB with k8s, setup the service, routing, networking, roles, etc than it would be in terraform. especially if you have to repeat this process in like 20 envs (stage, prod) x multiple regions

stonogo3y ago

This sounds dangerously close to "yeah but if the only tool you know is a hammer..."

AtlasBarfed3y ago

If (big if,) you org orchestration supports stateful sets.

Where I was the tooling was very focused on disposable api servers.

marcinzm3y ago

Can’t you just use this then: https://aws.amazon.com/blogs/containers/aws-controllers-for-...

1 more reply

superyesh3y ago

+1 Sometimes just because you can does not mean you should.

barrkel3y ago

What if you have finance customers who don't like commingled data, and you want to sell them a service and tell them with a straight face that their tenanted database isn't one bad query from serving up their data to someone else?

yesbabyyes3y ago

You can still have separate ACLs, databases, tables and even row level access control even if you share database servers.

2 more replies

potamic3y ago

It's also more resource efficient, especially for non-production or non-critical workloads. VMs only come in discrete configurations and many times even the smallest one is too big, wasting a lot of resource. When you run thousands of instances, thanks to the magic of microservices, the costs add up.

twblalock3y ago

Those are good arguments for ephemeral workloads but they don't make as much sense for databases.

dboreham3y ago

Another turtle.

Felminor3y ago· 8 in thread

That's just a really really bad write-up on the real problem on running a database on k8s.

You need ha because k8s should run already with automatic node upgrades.

You need a pod disruption budget to make sure it is running and switching over when a node fails or gets upgraded.

You want to either totally Oberprovision on memory or look into keep 2400 to make sure to fine-tune memory before k8s starts to throw your database out constantly.

K8s is not a VM.

If you use k8s and still don't take care of application migration strategies you still don't understand what cloud native means.

There are still other things missing here but still...

Of course excluding hobby people playing with k8s.

Memory and upgrading nodes are the two single most issues will see which disrupts service.

Otherwise k8s is a dream come true.

I still would try to use a db managed if it's critical.

Additional points: Zalando postgres operator is great and shows the real magic of k8s and operator.

Use a helm chart and just bring your own little database for dev test and e2e tests.

You can easily use Auto scaling for node profiles. No noisy neighbors. If your db is too small for normal nodes you don't have a problem anyway.

solatic3y ago

> k8s should run already with automatic node upgrades

This is difficult to impossible to do with databases; even if your database has a built-in recovery method for when a primary is taken offline, in such a way that allows for zero-downtime in theory, the reality is that such mechanisms depend on the secondary staying online until the failover mechanism is complete. If you turn over control of node upgrades to the cluster provider, the node under the secondary will get rebooted in the middle of the failover process, and you will get downtime at best, data loss at worst. What kubernetes teaches us is that databases aren't tied to the literal VM they're running on (which is now cattle), but rather on the availability of that node. If you run databases on kubernetes, you need to have a mechanism to slow down node upgrades.

Source: helped run hundreds of Elasticsearch and Kafka nodes on kubernetes in production at one point in my career

AtlasBarfed3y ago

Online lossless zero downtime upgrades?

I've done it with Cassandra...and yeah Kafka can do it I've heard.

But those can be 30 hour operations even with you ducks in a row, and you better have backup strategies ready.

Fun story, Amazon said rds would be always be zero downtime upgrades. But then came a major version upgrade and .... Surprise it wasn't.

1 more reply

redrove3y ago

Your claim about needing the primary online to failiover to the secondary is untrue, at least not for all Postgres operators.

Cloudnative PG rebuilds the secondary during failiover from the streamed WAL to an S3 endpoint. No primary needed.

GauntletWizard3y ago

Kubernetes does have this capability - Pod Disruption Budgets. They're underutilized and under tested, but at least the default cluster autoscaler respects them and will avoid destroying nodes that would break that constraint.

1 more reply

samokhvalov3y ago

> Use a helm chart and just bring your own little database for dev test and e2e tests.

dev, test, and e2e tests should be done against full-size db clones

e12e3y ago

> dev, test, and e2e tests should be done against full-size db clones

Real customer/sensitive data should not exist outside prod (and backup). So generally no, not full-size clones. I'd argue instrumentation in prod should give information on performance - for some tests/development you might need prod-size fake data.

2 more replies

axlee3y ago

> dev, test, and e2e tests should be done against full-size db clones

that's cute, what is your "full-size"? I don't have 2 days to run a test, and I'm pretty sure every single compliance requirements we are following would get obliterated the second someone hears about us doing that

2 more replies

Blackthorn3y ago

You think I'm going to clone a multiple petabyte database just to run some tests?

2 more replies

worldsayshi3y ago· 4 in thread

I've recently worked with putting postgres into kubernetes using the zalando operator. The impression has been such a mixed bag that it looks like we need to start over with some other operator. When we run into problems the documentation, error messages and configuration structure has been quite cryptic.

Does anyone have any specific recommendations on what to use (like which operator) when setting up a postgres cluster on k8s, specifically for standby replication?

erulabs3y ago

I would look at https://kubedb.com/ - operators are a mixed bag - but a bad operator can be a painful intro to K8s, that's for sure.

tiimbz3y ago

What type of issues did you run into?

We've been running Zalando Postgres Operator for all our prod and dev clusters (around 100 in total) and couldn't be happier.

worldsayshi3y ago

My impression is that when it works it works well but when it doesn't it doesn't help you that much. We have had two main issues:

- Can't set up two separate clusters in the same kubernetes instance because some cluster specific configuration is inexplicably set globally in the operator.

- Documentation and error messages are cryptic. Have to do a lot of trial and error to compensate for that. Maybe the issue here is a lack of experience with the stack used. Like Spilo and Patroni.

tommyzli3y ago

the last time I gave the Postgres operator space a serious look was about a year ago, and at the time the Zalando operator was far and away the most feature complete and mature.

We had a couple unusual requirements that the operator wasn't really suited for, so we ultimately ended up writing our own helm chart and forgoing the operator route altogether

mdasen3y ago· 3 in thread

I've been quite happy with CloudNativePG on k8s. It was simple for me to set up on a k8s cluster with one primary and two replicas, if the primary box goes down another instance becomes primary, deal with connection pooling, and simple to have backups go to a cloud object store. The alternative is dealing with all the replication manually, making sure that your leader election and failover work, making sure you can stand up new PG instances and get things replicated to the new instance, having a service that is checking the health of the database to trigger a failover, etc. It's certainly not impossible or anything like that, but CloudNativePG has been pretty easy. K8s isn't perfect or anything, but it's been a pretty nice experience for me.

I've tried other Postgres operators and been disappointed and it did require a little learning, but it's not like getting replication, Patroni, etcd, PGBouncer, HAProxy, and pgBackRest all running for a high-availability Postgres deployment is easy and wouldn't require learning.

As the author says, "[k8s's] operator model allows end users to programmatically manage their workloads by writing code against the core k8s APIs to automatically perform tasks that would previously have to be done manually." To me, that's the benefit. The operator can handle tasks like adding a replica or failing over the primary to one of the replicas. I could presumably do some of that with other tools on bare metal/VMs (I can always shell-script things), but I've had a good experience with CloudNativePG's operator. Likewise, as the author says, making day-2 operations easier is a big thing.

K8s does have some annoying amount of complexity, but it's been nice overall.

docandrew3y ago

This is really the secret - once someone figures out how to tie all the different k8s concepts into a functioning system, you can just copy it and put it on your cluster and it will probably work. Trying to figure it all out the first time is the messy part. If there’s an operator or Helm chart or something that does what you want there’s no shame in using it!

akdor11543y ago

That's kind of the issue with the db question - for your app logic, 'will probably work' is fine. Playing fast and loose with your database is less enticing - you more likely want to understand every bit of the stack between you and your db, or if not, at least have a support line to whinge at if you hit trouble.

1 more reply

ikiris3y ago

Thanks for the info, I had not seen this one before.

tristor3y ago· 3 in thread

StatefulSet and PVCs aren’t sufficient to fully handle all the likely resilience challenges of running a database cluster on K8S. There needs to be some rethinking on how StatefulSet works to make it more appropriate to this use case, such as allowing Pods to be started out of order when recovering from failures.

I worked in this problem space extensively until 2020, and I think that there are paths forward but they require changes in K8S that none of the folks involved seem motivated to make. Realistically to make databases in K8S work well today you need a database built for K8S rather than one adapted for K8S.

The building blocks present today are not fundamentally capable of building a positive UX for adapting existing databases to K8S, but this is something that is worth making possible and I hope the community gets there some day.

smarterclayton3y ago

Re out of order:

Is https://kubernetes.io/docs/concepts/workloads/controllers/st... unsuitable for that?

tristor3y ago

Unfortunately, while rolling updates account for some scenarios, they are not sufficient for handling out of order restarts where the order cannot be pre-determined. There’s probably some hack you could build with partitioning to mostly address the cases I am thinking of, but it isn’t elegant or guaranteed correct.

This will be a problem for any database where clustering is synchronous and a specific primary node must start first on a full cluster restart. There are other out of band hacks you can do with reassigning PVCs, but it’s never elegant in the current primitives provided.

During my work in this problem space I became convinced that primitives for stateful applications in K8S were built specifically without considering databases as a valid use case. Everything else is just hacks after the fact to make it “work”.

2 more replies

bogomipz3y ago

>"I worked in this problem space extensively until 2020, and I think that there are paths forward but they require changes in K8S that none of the folks involved seem motivated to make."

Can you say what you see as those possible paths forward and what changes they would require?

bwarminski3y ago· 2 in thread

This article does a great job describing the investment required to pull this off. At HubSpot, my team is running a large Vitess/MySql deployment (500+ distinct databases some sharded, multi region) atop k8s today and had to learn a lot of those same lessons and primitives. We opted to write our own operator(s) to do it. In the end, the investment has paid off in terms of being able to build self service functionality for the rest of the business and write the kinds of tools workflows that allow us to support it with a relatively small team. The value is in the operator pattern itself and being able to manipulate things on a common control plane. Compared to the alternative of managing this with Terraform and Puppet/Ansible/Chef directly on EC2, which I've also done before, it's a better experience and much more maintainable even at the fixed expense of additional training and tooling.

I won't disagree with others that RDS is probably worth it until you need something very specific or have reached a certain scale.

Happy to share tips or pointers for anyone going down this path specifically with MySql or database workloads in general.

matesz3y ago

The first question which comes to my mind is what are performance implications of running database like you do inside k8s vs EC2 vs bare metal? And how did you solve multitenancy? Does the operator handle lifecycle of database per customer simply or is it something more complicated?

ps. And how you deal with migrations? ps. Forgive me if I'm asking for too much!

bwarminski3y ago

No worries, happy to share more details. For the databases where performance is a concern, we use constraints and reservation requests to all but guarantee it will be the only tenant on the node and we actively monitor CPU throttle and will autoscale in cases where it is sustained for a long period of time. We're actually achieving better overall utilization with this setup vs bare metal and arent dealing with a lot of issues with resource contention.

There is a main operator responsible for all the databases. It handles configuration changes, provisioning pods and slowly rolling out changes. In kube we model this with a custom resource we've defined called a KeyspaceShard which represents a named set of database instances that should participate in replication together. Once provisioned, the pods know how to hook up and detach from Vitess without requiring further involvement from the operator. Vitess handles backups and maintains the replication topology. "Complicated" is an apt description of what it does but not "complex". Evicting a database pod and letting the system reschedule and converge is a routine operation that doesn't cause much concern.

Migrations are done with gh-ost, which has its own custom operator that manages the lifecycle of the migration and ties into self service tooling we provide that is integrated with our build and deploy system.

2 more replies

vcryan3y ago· 2 in thread

I guess there must be a usecase in missing here, but RDS is working so well for me, it's hard to imagine why I would not shift most of the operational concerns to this competent vendor.

The only thing I can think of is cost. My usage probably isn't high enough where there is any financial benefit to an alternative... but if it was, maybe I'd be considering this.

yjftsjthsd-h3y ago

I mean, yeah, cost is kind of the problem with AWS, especially for large amounts of data. Do your own cost/benefit of course, but for some of us it's a non-starter.

solatic3y ago

If you have a small project and you only need one or two RDS databases in production, stick with RDS. The cost isn't that high and you save a ton in aggravation. Yes you need to set up separate monitoring, firewalling, etc. but it's really not a big deal.

When you start to have dozens or hundreds of databases in production, and developers asking "I need Postgres in production, why can't I just click a button and get a Postgres instance for my service in production?" then scaling the monitoring and firewalling gets a little more complicated. Hooking into standard Kubernetes monitoring and service meshes can really help to simplify things.

advisedwang3y ago· 2 in thread

Not to diminish the product that QuestDB is working on, but another solution that works very well with Kubernetes is Vitess. Vitess is basically sharded MySQL, but it automatically manages this very well and has built in kubernetes support so it really handles the "pets to cattle" thing well.

unity10013y ago

> "pets to cattle"

That phrase is inaccurate. With the cloud and K8, the pets move from being software that is tightly tied to the hardware to being a collection of configurations and software that are tightly tied to themselves.

We just make the actual physical hardware anonymous. But from the perspective of the actual stack, there is still a server with its cpus, filesystem, i/o and everything.

"Pets that you can carry" is more like it.

nly3y ago

I really like this analogy

middle-marathon3y ago· 2 in thread

As a relative newcomer to k8s I was a bit surprised at the lack of backup tools available, coming from the world of on-prem Veeam which had more features than I knew what to do with. In my current role we had to find a way to back up our Postgres DBs running on k8s. We started using Kanister to actually take the backups but found there wasn't much around to actually manage the backups' lifecycle. I ended up writing Taweret (https://github.com/swissDataScienceCenter/taweret), a small tool which just ends up interacting with the Kanister CRDs to delete backups we no longer require based on a defined backups strategy.

ithkuil3y ago

But that's what k8s is. It's not a tool that does a thing, but rather a set of APIs and patterns that let you glue together many tools that will let you a thing (for better or worse)

middle-marathon3y ago

Sure, I meant there wasn't really much around which ran on k8s to manage backups.

debarshri3y ago· 1 in thread

I used to work for an org that deployed 3rd party legaltech "apps" on kubernetes which had all batteries included - Postgres, rabbitmq, redis, you name it I have seen it. Running statefulset even with the best operator there with a team of 4 is nothing short of a nightmare. Couple this with stability of rook ceph.

In 2019, every operator had crazy bugs, we inherited all of them. You have to solve not just databases level error but also errors popping from operators. If you can avoid databases on kubernetes, you should just do it.

twblalock3y ago

One of the big problems with Kubernetes in general, especially back in 2019, is the alpha quality of almost everything in the ecosystem. Especially service meshes.

It seems to get worse the further down the stack you go. I’ve seen tons of problems with operators, monitoring tools, and CNIs.

It’s somewhat better now but there is still a lot of stuff you can’t depend on. The CNCF seems to endorse pretty much anything even if it’s crap.

ahachete3y ago

The key for me is the level of automation that you can reach at a reasonable "development cost". Let me elaborate.

K8s, if anything, is an API. An API that allows you to interact with compute, storage and networks in a way that is abstracted from the actual underlying infrastructure. This is incredibly powerful. You can, essentially, code and automate all your infrastructure.

But this goes beyond deployment, something you could achieve (more or less) with tools like Terraform or Pulumi. Enter "Day 2 operations".

Day 2 operations are essential for any database. And cloud services have done a good job at automating them. Speaking of Postgres, my daily job, things like HA, backups but also minor and major version upgrades are table stakes day 2 operations.

If you want to build these day 2 operations in the cloud (say on VMs), even though you have APIs do to so, a) they don't implement a pattern like Kubernete's reconciliation cycle; and b) you have a distinct API per cloud. K8s solves both problems, making it way "cheaper" to build such an automation. On K8s, a given operator can code these day 2 operations against K8s APIs. Therefore, if you want to build such automation, either you are a cloud provider (and potentially do this only for your own cloud) or you do it on Kubernetes.

This is so much true, that existing operators have already gone beyond what DBaaS do. Speaking of StackGres [0] (disclaimer: founder), we have implemented day 2 operations (other than the "table stakes" ones that I mentioned before) that no other DBaaS offers as of today, such as vacuums, repacks and even benchmarks (and more day 2 operations will be developed). See [1] for the CRD specs of SGDbOps, our "Day 2 operations" if you are interested.

[0] https://stackgres.io [1] https://stackgres.io/doc/latest/reference/crd/sgdbops/

sklarsaOP3y ago

Hi, author here! Over the past 6 months, I've been building a hosted service for a database on top of k8s at QuestDB, and wanted to share some of my thoughts on the topic. I was inspired by the recent twitter discussion led by Kelsey Hightower a few weeks ago. Hope you find it interesting!

rdtsc3y ago

> K8s has an extensible Operator pattern that you can use to manage your own Custom Resources (CRs) by writing and deploying a controller

I have seen it fail way too many times. Inspecting a failing deployment that now has some magic Go code someone wrote running on this cluster. I can see using the basic kube building blocks: deployments, pods, config maps, etc.; there are enough guides and tools to help you out. As soon as you start writing code that runs in there, you're now dealing with two problems: your actual thing you're deploying, and now the operator.

Well, and then you need a mesh, and a way to manage certificates. and if it's a database to manage all the volumes. Everything looks good at the architect level - all the boxes and arrows line up, but when it breaks in production it's a nightmare to debug.

xyz-x3y ago

We ran Zalando Operator for Postgres in k8s for a year, until finally succumbing to its technical debt that leaks out from every bit of its software being.

After switching to the Chrunchy Data pg operator v5 on k8s, we've had close to zero problems - one or two times a year the log shipping / HA replication fails and we have to restart it, but it's really neat! I can *warmly* recommend it; it really is CloudSQL in K8S.

tonymet3y ago

kuberDBs seems like an unnecessary complication

j / k navigate · click thread line to collapse

75 comments

52 comments · 15 top-level

_skel3y ago· 10 in thread

I don't think the upsides are worth all the work.

annexrichmond3y ago

stonogo3y ago

This sounds dangerously close to "yeah but if the only tool you know is a hammer..."

AtlasBarfed3y ago

If (big if,) you org orchestration supports stateful sets.

Where I was the tooling was very focused on disposable api servers.

marcinzm3y ago

Can’t you just use this then: https://aws.amazon.com/blogs/containers/aws-controllers-for-...

1 more reply

superyesh3y ago

+1 Sometimes just because you can does not mean you should.

barrkel3y ago

yesbabyyes3y ago

You can still have separate ACLs, databases, tables and even row level access control even if you share database servers.

2 more replies

potamic3y ago

twblalock3y ago

Those are good arguments for ephemeral workloads but they don't make as much sense for databases.

dboreham3y ago

Another turtle.

Felminor3y ago· 8 in thread

That's just a really really bad write-up on the real problem on running a database on k8s.

You need ha because k8s should run already with automatic node upgrades.

You need a pod disruption budget to make sure it is running and switching over when a node fails or gets upgraded.

You want to either totally Oberprovision on memory or look into keep 2400 to make sure to fine-tune memory before k8s starts to throw your database out constantly.

K8s is not a VM.

If you use k8s and still don't take care of application migration strategies you still don't understand what cloud native means.

There are still other things missing here but still...

Of course excluding hobby people playing with k8s.

Memory and upgrading nodes are the two single most issues will see which disrupts service.

Otherwise k8s is a dream come true.

I still would try to use a db managed if it's critical.

Additional points: Zalando postgres operator is great and shows the real magic of k8s and operator.

Use a helm chart and just bring your own little database for dev test and e2e tests.

You can easily use Auto scaling for node profiles. No noisy neighbors. If your db is too small for normal nodes you don't have a problem anyway.

solatic3y ago

> k8s should run already with automatic node upgrades

Source: helped run hundreds of Elasticsearch and Kafka nodes on kubernetes in production at one point in my career

AtlasBarfed3y ago

Online lossless zero downtime upgrades?

I've done it with Cassandra...and yeah Kafka can do it I've heard.

But those can be 30 hour operations even with you ducks in a row, and you better have backup strategies ready.

Fun story, Amazon said rds would be always be zero downtime upgrades. But then came a major version upgrade and .... Surprise it wasn't.

1 more reply

redrove3y ago

Your claim about needing the primary online to failiover to the secondary is untrue, at least not for all Postgres operators.

Cloudnative PG rebuilds the secondary during failiover from the streamed WAL to an S3 endpoint. No primary needed.

GauntletWizard3y ago

1 more reply

samokhvalov3y ago

> Use a helm chart and just bring your own little database for dev test and e2e tests.

dev, test, and e2e tests should be done against full-size db clones

e12e3y ago

> dev, test, and e2e tests should be done against full-size db clones

2 more replies

axlee3y ago

> dev, test, and e2e tests should be done against full-size db clones

2 more replies

Blackthorn3y ago

You think I'm going to clone a multiple petabyte database just to run some tests?

2 more replies

worldsayshi3y ago· 4 in thread

Does anyone have any specific recommendations on what to use (like which operator) when setting up a postgres cluster on k8s, specifically for standby replication?

erulabs3y ago

I would look at https://kubedb.com/ - operators are a mixed bag - but a bad operator can be a painful intro to K8s, that's for sure.

tiimbz3y ago

What type of issues did you run into?

We've been running Zalando Postgres Operator for all our prod and dev clusters (around 100 in total) and couldn't be happier.

worldsayshi3y ago

My impression is that when it works it works well but when it doesn't it doesn't help you that much. We have had two main issues:

- Can't set up two separate clusters in the same kubernetes instance because some cluster specific configuration is inexplicably set globally in the operator.

- Documentation and error messages are cryptic. Have to do a lot of trial and error to compensate for that. Maybe the issue here is a lack of experience with the stack used. Like Spilo and Patroni.

tommyzli3y ago

the last time I gave the Postgres operator space a serious look was about a year ago, and at the time the Zalando operator was far and away the most feature complete and mature.

We had a couple unusual requirements that the operator wasn't really suited for, so we ultimately ended up writing our own helm chart and forgoing the operator route altogether

mdasen3y ago· 3 in thread

K8s does have some annoying amount of complexity, but it's been nice overall.

docandrew3y ago

akdor11543y ago

1 more reply

ikiris3y ago

Thanks for the info, I had not seen this one before.

tristor3y ago· 3 in thread

smarterclayton3y ago

Re out of order:

Is https://kubernetes.io/docs/concepts/workloads/controllers/st... unsuitable for that?

tristor3y ago

2 more replies

bogomipz3y ago

>"I worked in this problem space extensively until 2020, and I think that there are paths forward but they require changes in K8S that none of the folks involved seem motivated to make."

Can you say what you see as those possible paths forward and what changes they would require?

bwarminski3y ago· 2 in thread

I won't disagree with others that RDS is probably worth it until you need something very specific or have reached a certain scale.

Happy to share tips or pointers for anyone going down this path specifically with MySql or database workloads in general.

matesz3y ago

ps. And how you deal with migrations? ps. Forgive me if I'm asking for too much!

bwarminski3y ago

2 more replies

vcryan3y ago· 2 in thread

I guess there must be a usecase in missing here, but RDS is working so well for me, it's hard to imagine why I would not shift most of the operational concerns to this competent vendor.

The only thing I can think of is cost. My usage probably isn't high enough where there is any financial benefit to an alternative... but if it was, maybe I'd be considering this.

yjftsjthsd-h3y ago

I mean, yeah, cost is kind of the problem with AWS, especially for large amounts of data. Do your own cost/benefit of course, but for some of us it's a non-starter.

solatic3y ago

advisedwang3y ago· 2 in thread

unity10013y ago

> "pets to cattle"

We just make the actual physical hardware anonymous. But from the perspective of the actual stack, there is still a server with its cpus, filesystem, i/o and everything.

"Pets that you can carry" is more like it.

nly3y ago

I really like this analogy

middle-marathon3y ago· 2 in thread

ithkuil3y ago

But that's what k8s is. It's not a tool that does a thing, but rather a set of APIs and patterns that let you glue together many tools that will let you a thing (for better or worse)

middle-marathon3y ago

Sure, I meant there wasn't really much around which ran on k8s to manage backups.

debarshri3y ago· 1 in thread

twblalock3y ago

One of the big problems with Kubernetes in general, especially back in 2019, is the alpha quality of almost everything in the ecosystem. Especially service meshes.

It seems to get worse the further down the stack you go. I’ve seen tons of problems with operators, monitoring tools, and CNIs.

It’s somewhat better now but there is still a lot of stuff you can’t depend on. The CNCF seems to endorse pretty much anything even if it’s crap.

ahachete3y ago

The key for me is the level of automation that you can reach at a reasonable "development cost". Let me elaborate.

But this goes beyond deployment, something you could achieve (more or less) with tools like Terraform or Pulumi. Enter "Day 2 operations".

[0] https://stackgres.io [1] https://stackgres.io/doc/latest/reference/crd/sgdbops/

sklarsaOP3y ago

rdtsc3y ago

> K8s has an extensible Operator pattern that you can use to manage your own Custom Resources (CRs) by writing and deploying a controller

xyz-x3y ago

We ran Zalando Operator for Postgres in k8s for a year, until finally succumbing to its technical debt that leaks out from every bit of its software being.

tonymet3y ago

kuberDBs seems like an unnecessary complication

j / k navigate · click thread line to collapse