AWS Controllers for Kubernetes (opens in new tab)

(github.com)

111 pointsyonasb5y ago53 comments

53 comments

33 comments · 10 top-level

mfer5y ago· 13 in thread

One consequence is the accidental deletion of AWS things...

If a CRD is deleted the CRs described it are also deleted. So, deleting a CRD (even accidentally) could end up deleting resources in AWS (e.g., backups). So, be careful.

Some things being managed by Kubernetes would be really cool. Other things being managed by k8s could break things if something goes wrong. I would plan accordingly.

NathanKP5y ago

Hi I'm a developer advocate in the AWS Container organization. This question of how to handle resource destruction is an active issue on the project, and we'd welcome your comments (or those of anyone else) on this Github issue: https://github.com/aws/aws-controllers-k8s/issues/82

Our goal is to make this project have "no surprises" and therefore no unexpected destruction of resources. The specifics of how we mark resource as safe to delete instead of retaining by default are under discussion on that Github issue.

brnkrygs5y ago

One way our team has dealt with this question in a highly serverless-oriented and infrastructure-as-code-driven (how's THAT for buzzword soup) environment is to explicitly separate stateful resources from stateless, while exposing reference hooks in a configuration store to cross from one to the other. We've found that doing so _greatly_ reduces the blast radius of mistakes and lets us move more quickly and confidently.

The stateless stacks generally have a lot of development activity going on, and rapidly iterate. This is where most of our code and logic lives. This is where the vast majority of our deployment (and related cloud configuration) activity happens.

All of that thrash is kept away from the stateful stacks - think S3 buckets or DynamoDB tables - where, if THOSE thrash, we potentially get an outage at best, or lose data at worst (backups notwithstanding).

We DO NOT WANT stateless oriented stacks to own the lifecycle for stateful stacks. They inherently need to be treated differently. Or, at least the impact of mistakes is different.

The trick comes when you need to tie them together. To do this, we've added CloudFormation hooks and other deployment time logic that publish ARN and other connectivity info to our configuration store. The stateless services look up config values either during deployment or at runtime and are able to find the details they need to reference the state resources they need access to.

We've poked at toolsets like Amplify that lump everything together and have already been bitten numerous times. We've found that the difference between stateful and stateless resources should not be papered over, but instead emphasized and supported explicitly by tooling.

... all of this being one team's experience over the years, of course.

Very curious to see how this paradigm evolves here!

[edit]… Riffing on this just a little bit further… as I’m thinking about it here, it comes down to abstraction level. In a deployment or resource management domain, a generic “this is a cloud resource” isn’t very useful. What’s way _more_ useful is something like “this is a stateful resource” or “this is a stateless resource”, because that level describes resource behavior more clearly, AND how to interface with or manage those resources.

There are echos of code development principles here intentionally - robust cloud infrastructure management is mirrors software dev practices as much as infrastructure management ones!

1 more reply

bassman90005y ago

I sincerely hope this works well. CloudFormation has already many issues with unintended changes, visibility of the changes to be done, auto update/deletion of resources, etc. Having another path to provision infrastructure that is safe, consistent, and provides visibility at all phases would be great.

digitallogic5y ago

Also, not all AWS follow the same deletion semantics. Example: S3 buckets. The report as being deleted somewhat quickly, but their name may not be available again for hour or so.

In this case the delete will appear to succeed, but the recreation, if done with the same name, may fail.

freedomben5y ago

Great example. I worry about adding a layer of abstraction over provisioning resources this way.

Of course we have to try this because it's a badass (tho obvious in hindsight) idea, but in practice it might have some downsides.

jpeeler5y ago

Does the create during this time window return a specific enough error? This seems like the exact case where a controller that never gives up could provide value. Though I'm kind of amazed this is on the order of an hour instead of minutes.

mafro5y ago

This issue already exists with, say, Terraform to orchestrate infra with code. The solution is to append a random hex string to the resource unique identifier.

This AWS project will need to support a feature like that.

outworlder5y ago

Exactly. Planning accordingly is the right answer.

Assume anyone can destroy your infrastructure at any time (by mistake or otherwise). This could be done with cloudformation, terraform, API calls, and essentially any automation(with different levels of safeguards).

Be prepared for that. Be careful with your data. Not so careful with individual servers - they should be cattle, not pets.

EDIT: If this is a production system, one could take away any 'delete' permissions until they are needed again.

ForHackernews5y ago

"cattle, not pets" is a nice slogan, but I think most ranchers would be angry if a junior ranch hand accidentally poisoned the entire herd with a typo.

cactus20935y ago

I think you have mostly the same problems in Terraform, Pulumi, or Cloudformation though right? Is there anything that makes it easier to accidentally do in k8s?

One layer of defense in all of these cases is keeping the IAM credentials that the configuration management tool uses from having any deletion permissions.

scarface745y ago

That’s an excellent idea. I never thought about that. In hindsight, it makes perfect sense.

Also, set the DeletionPolicy in CF to false.

mey5y ago

There are certain things that freak me out in IAC, like loosing a primary DB or secrets. I want IAC to create them, wire things together and index them, but the idea of an accidental mis-configuration deleting a production database gives me heartburn.

uHuge5y ago

That's what you should have QA pipeline ready to check before you'll be able to apply the config, right? Secrets should be encrypted in git to my best awareness.

sytse5y ago· 6 in thread

Kelsey Hightower had a good take in https://twitter.com/kelseyhightower/status/12963119515307048... "AWS Controllers for Kubernetes is pretty dope. You can leverage Kubernetes to manage AWS resources such as API gateways and S3 buckets. Think Terraform but backed by Kubernetes style APIs and "realtime" control loops."

An in the thread he mentions Crossplane as the cross-cloud way to do this https://twitter.com/kelseyhightower/status/12963213771342315...

rektide5y ago

Every Kubernetes shop I've seen uses Helm to deploy their workloads, & now you can set up all your S3 Buckets, SQS, SNS stuff in that same Helm chart as your app. This is hella ideal.

I also just generally love the thought of being able to manage any cloud resources at all via standard open-protocols & systems. Compounding the investment, rather than having to invest in a bunch of specific not-interconnected areas is going to lead to great things.

It'll be interesting to see what if any architectural flourishes or innovations went in to ACK's control loops.

etxm5y ago

It’s all fun and games until a deployment is acting funny and somebody destroys the chart and re-creates it and takes the bucket out with it.

Does it have functionality like Kube DB that makes a “dormant” version of the state store?

3 more replies

takeda5y ago

I thought whole point of running K8S in AWS was to have infrastructure not tied to their offerings.

If you use something like that, why use K8S and not just use AWS services natively?

scarface745y ago

The entire idea of your infrastructure not being tied to your provider is completely lost when you are at any type of scale. I have a feeling that anyone who thinks it is easy or even usually worth it has never been part of planning a large scale migration.

At the same time, you usually end up spending more money and having worse results when you don’t go all in.

As far as why use EKS vs ECS - the “native service”? They seem to have feature parity, ECS is easier to use for the unitiated. But, there are so many people who know k8s and your knowledge is portable.

Which brings up my second point. Most software engineers don’t care about cloud mobility as much as they claim. They care about career mobility. There is a much better chance that you will leave a company and move to a company on a different provider than your company will. I’m not saying it’s a bad thing to focus on technologies that give you as an individual the most optionality.

1 more reply

harpratap5y ago

This is just one small step in that direction. Once everyone is on the same page about Infrastucture-as-Data and behaves in a consistent and expected manner, we can move forward with wrapping these interface. Crossplane is a one such project.

k__5y ago

Is there something like CloudFormation console for EKS/ACK?

last time I deleted a cluster it failed because of a NLB still being around but not accounted for as CFN resource, even though I provisioned the cluster with CFN.

brian_herman__5y ago· 2 in thread

This reminds me of this classic comic. https://www.catmuseumsf.org/images/print/comix/bill.jpg

femto1135y ago

Or almost any Cathy strip https://mirabiledictu.org/cathy-ack-2/

francesca5y ago

Reminds me of Nantucket's airport

cagenut5y ago· 1 in thread

On the one hand this could be such a cool and powerful concept.

On the other hand my brain segfaults on the recursive loop of how the layer-inversion gets modeled as IaC with a CI/CD pipeline. I guess if you were very strict about having your provider-infra layer (cloudformation/terraform) do only the bare minimum to get your kube environment up, and then within that kube environment you used something like ACK to provision any cloud-provider resources that your kube-managed apps/pipelines needed.

Yet another case where I'm like "I don't know if kube should be the answer to everything, but I sure as shit won't miss <x>".

rektide5y ago

Yes, the goal is very much to bring provisioning & operations of all resources on to the core platform we're using, Kubernetes. I mentioned in another reply how excited I am to be able to manage things like SQS queues now with my app, via Helm Charts, rather than need separate machinery to manage/operate my app & the various resources it needs.

And now there is a control loop. So if Ben in support accidentally deletes my queue, it'll get recreated.

Breaking away from the proprietary platform underlay is going to be great. Managing things more consistently is going to be really great.

MuffinFlavored5y ago· 1 in thread

How is this different than Terraform?

harpratap5y ago

Traditional Infrastructure-as-Code solutions were based on edge-triggers. Like create ingress, delete ingress. But what about when ingress misbehaves or is in an unrecoverable state?

Kubernetes introduced edge-triggered level-driven with resync reconciliation based "controllers". User defines a state and the controller does it's best to keep the infra in this desired state all the times. (Although Terraform has also moved to this same design in recent times)

This establishes a consistent experience. Everyone knows you just need to do kubectl get my-resource to check your desired state. All the issues will be logged in status and controller. You can combine multiple controllers to achieve your desired application design. For example, Knative has their own kind called "Service" which has some custom components, some inherited from istio and things like replicasets from default kubernetes controllers.

compsciphd5y ago

I started building something along this line a few years ago. the ability to control AWS VM and treat them as pods (i.e. can backend a service, access other services) to have a hybrid (VM / Container) infrastucture that is all managed in the kubernetes way. Future work would have been to try and manage other resources similarly.

Sadly startup interest changed and then went under (but the freedom I was given to explore there was the best experience I have ever had)

https://github.com/apporbit/infranetes

buzer5y ago

Some prior discussion: https://news.ycombinator.com/item?id=24219448

just-juan-post5y ago

You can create a S3 bucket but you can't set permissions on it.

Pass, I'll check back in a year.

HereBeBeasties5y ago

See also KubeForm - https://kubeform.com/ which will do a lot of this already, via Terraform.

nikolay5y ago

I Terraform Kubernetes Operator sounds like a better idea.

j / k navigate · click thread line to collapse

53 comments

33 comments · 10 top-level

mfer5y ago· 13 in thread

One consequence is the accidental deletion of AWS things...

If a CRD is deleted the CRs described it are also deleted. So, deleting a CRD (even accidentally) could end up deleting resources in AWS (e.g., backups). So, be careful.

Some things being managed by Kubernetes would be really cool. Other things being managed by k8s could break things if something goes wrong. I would plan accordingly.

NathanKP5y ago

brnkrygs5y ago

We DO NOT WANT stateless oriented stacks to own the lifecycle for stateful stacks. They inherently need to be treated differently. Or, at least the impact of mistakes is different.

... all of this being one team's experience over the years, of course.

Very curious to see how this paradigm evolves here!

There are echos of code development principles here intentionally - robust cloud infrastructure management is mirrors software dev practices as much as infrastructure management ones!

1 more reply

bassman90005y ago

digitallogic5y ago

Also, not all AWS follow the same deletion semantics. Example: S3 buckets. The report as being deleted somewhat quickly, but their name may not be available again for hour or so.

In this case the delete will appear to succeed, but the recreation, if done with the same name, may fail.

freedomben5y ago

Great example. I worry about adding a layer of abstraction over provisioning resources this way.

Of course we have to try this because it's a badass (tho obvious in hindsight) idea, but in practice it might have some downsides.

jpeeler5y ago

mafro5y ago

This issue already exists with, say, Terraform to orchestrate infra with code. The solution is to append a random hex string to the resource unique identifier.

This AWS project will need to support a feature like that.

outworlder5y ago

Exactly. Planning accordingly is the right answer.

Be prepared for that. Be careful with your data. Not so careful with individual servers - they should be cattle, not pets.

EDIT: If this is a production system, one could take away any 'delete' permissions until they are needed again.

ForHackernews5y ago

"cattle, not pets" is a nice slogan, but I think most ranchers would be angry if a junior ranch hand accidentally poisoned the entire herd with a typo.

cactus20935y ago

I think you have mostly the same problems in Terraform, Pulumi, or Cloudformation though right? Is there anything that makes it easier to accidentally do in k8s?

One layer of defense in all of these cases is keeping the IAM credentials that the configuration management tool uses from having any deletion permissions.

scarface745y ago

That’s an excellent idea. I never thought about that. In hindsight, it makes perfect sense.

Also, set the DeletionPolicy in CF to false.

mey5y ago

uHuge5y ago

That's what you should have QA pipeline ready to check before you'll be able to apply the config, right? Secrets should be encrypted in git to my best awareness.

sytse5y ago· 6 in thread

An in the thread he mentions Crossplane as the cross-cloud way to do this https://twitter.com/kelseyhightower/status/12963213771342315...

rektide5y ago

Every Kubernetes shop I've seen uses Helm to deploy their workloads, & now you can set up all your S3 Buckets, SQS, SNS stuff in that same Helm chart as your app. This is hella ideal.

It'll be interesting to see what if any architectural flourishes or innovations went in to ACK's control loops.

etxm5y ago

It’s all fun and games until a deployment is acting funny and somebody destroys the chart and re-creates it and takes the bucket out with it.

Does it have functionality like Kube DB that makes a “dormant” version of the state store?

3 more replies

takeda5y ago

I thought whole point of running K8S in AWS was to have infrastructure not tied to their offerings.

If you use something like that, why use K8S and not just use AWS services natively?

scarface745y ago

At the same time, you usually end up spending more money and having worse results when you don’t go all in.

1 more reply

harpratap5y ago

k__5y ago

Is there something like CloudFormation console for EKS/ACK?

last time I deleted a cluster it failed because of a NLB still being around but not accounted for as CFN resource, even though I provisioned the cluster with CFN.

brian_herman__5y ago· 2 in thread

This reminds me of this classic comic. https://www.catmuseumsf.org/images/print/comix/bill.jpg

femto1135y ago

Or almost any Cathy strip https://mirabiledictu.org/cathy-ack-2/

francesca5y ago

Reminds me of Nantucket's airport

cagenut5y ago· 1 in thread

On the one hand this could be such a cool and powerful concept.

Yet another case where I'm like "I don't know if kube should be the answer to everything, but I sure as shit won't miss <x>".

rektide5y ago

And now there is a control loop. So if Ben in support accidentally deletes my queue, it'll get recreated.

Breaking away from the proprietary platform underlay is going to be great. Managing things more consistently is going to be really great.

MuffinFlavored5y ago· 1 in thread

How is this different than Terraform?

harpratap5y ago

Traditional Infrastructure-as-Code solutions were based on edge-triggers. Like create ingress, delete ingress. But what about when ingress misbehaves or is in an unrecoverable state?

compsciphd5y ago

Sadly startup interest changed and then went under (but the freedom I was given to explore there was the best experience I have ever had)

https://github.com/apporbit/infranetes

buzer5y ago

Some prior discussion: https://news.ycombinator.com/item?id=24219448

just-juan-post5y ago

You can create a S3 bucket but you can't set permissions on it.

Pass, I'll check back in a year.

HereBeBeasties5y ago

See also KubeForm - https://kubeform.com/ which will do a lot of this already, via Terraform.

nikolay5y ago

I Terraform Kubernetes Operator sounds like a better idea.

j / k navigate · click thread line to collapse