Give me Terraform (as much as I hate it) any day.
I won't go as far as to say we burned bridges arguing back and forth about it but they were definitely significantly singed.
Config files simply don't work until they do. And if it's your job to stare at them for hours and hours a day then maybe that's okay with you, but if you expect other people to 'just learn' it you're an idiot or an asshole. Or both. Ain't nobody got time for magic incantations.
I also think it should tell you you're on the wrong path when your app is named after a verb and the data it deals with is all declarative.
If you’re ignoring guidance and patterns and getting mad reinventing the wheel, that’s on dev. If “ops” mandates tooling and doesn’t have any skin in the game, that’s on them. And both problems are on your leadership.
If y’all just hate each other and don’t listen or participate, then you can’t be successful. It is ironic that this is the pattern that the devops movement landed us in.
Sure "use code to deploy infrastructure" sounds great, and that is why we get stuff like Ant, Gradle, Pulumi, Jenkins Groovy scripts, .NET Aspire,.... until someone has to debug spaghetti code on a broken deployment.
a dsl like SQL involves one basic substrate (data organized in tables) that you can compile in your head. But declarative infra as code involves a thousand different things across a dozen different clouds.
Declarative will hold off spaghetti for... A bit. But it devolves to spaghetti as well (think fine grained acls, or places where order of operations, which the dsl does not specify and is magically resolved, becomes ambiguous).
And if you need to go off the reservation (dsl support doesn't exist or is immature for rapidly evolving platforms, need some custom postprocess steps) then you are... What?
Probably writing code and scripts to autoinvoke on the new node, phone home to a central.... Yup that's code.
Finally, declarative code has an implicit execution loop. But for something like iac that is a very complicated, the execution loop that isn't well documented. And some committed changes to declarative code May trigger a destructive pass followed by a possibly broken constructive phase.
It's a tough problem.
* You can't make have variables in an import block (for example, to specify a different "id" value for each workspace)
* There is no explicit way to make a resource conditional based on variables. Only a hacky way to do that using "count = foo ? 1 : 0"
* You can't have variables in the backend configuration, making it impossible to store states in different places depending on the environment.
* You can't have variables in the "ignore_changes" field of a resource, making it impossible to dynamically ignore changes for a field (for example, based on module variables).
* The VSCode extension for HCL is slow and buggy. Using TS with pulumi or TFCDK makes it possible to use all the existing tooling of the language.
You get the bonus of controlling the resource id and being able to selectively delete resources without worrying about ordering.
I’ve been burned so many times here that I hate all of this stuff with an extreme passion.
Crossplane seems to be a genuinely better way out but there are big gotchas there also like resources that can simply never be deleted
I find that with some handwringing, C# can be forced to do almost anything. between extension methods, dispatch proxies and reflection you can pummel it into basically any shape.
Having to write a little boilerplate to make it happen can be a drag though. I do sometimes wish C# had something from a blank project that let me operate with as much reckless abandon as Object.assign does in js land.
I use C# extensively for most other things I do, but this the one area where I prefer not to use it.
Terraform sure is a quirky little DSL ain’t it? It’s so weirdly verbose.
But at the same time I can create some azure function app, setup my GitHub build pipeline, get auth0 happy and in theory hook up parts of stripe all in one system. All those random diverse API’s plumbed together and somehow it manages to work.
But boy howdy is that language weird.
But yeah, at $previous_job, Terraform enabled some really fantastic cross-SaaS integrations. Stuff like standing up a whole stack on AWS and creating a statuspage.io page and configuring Pingdom all at once. Perfect for customers who wanted their own instance of an application in an isolated fashion.
We also built an auto-approver for Terraform plans based on fingerprinting "known-good" (safe to execute) plans, but that's a story for a different day.
Honestly, I only use terraform with hiera now, so I pretty much only write generic and reusable "wrapper" modules that accept a single block of data from Hiera via var.config. I can use this to wrap any 3rd party module, and even wrote a simple script to wrap any module by pointing at its git project.
That probably scares the shit out of folks who do the right thing, and use a bunch of vars with types and defaults. But it's so extremely flexible and it neutered all of the usual complexity and hassle I had writing terraform. I have single handedly deployed an entire infrastructure via terraform like this, from DNS domains up through networking, k8s clusters, helm charts and monitoring stack (and a heap of other AWS services like API Gateway, SQS, SES etc). The beauty of removing all of the data out to Hiera is that I can deploy new infra to a new region in about an 2 hours, or deploy a new environment to an existing region in about 10 minutes. All of that time is just waiting for AWS to spin things up. All I have to do in code is literally "cp -a eu-west-1/production eu-west-2/production" and then let all of the "stacks" under that directory tree deploy. Zero code changes, zero name clashes, one man band.
The hardest part is sticking rigidly to naming conventions and choosing good ones. That might seem hard because cloud resources can have different naming rules or uniqueness requirements. But when you build all of your names from a small collection of hiera vars like "%{product}-%{env}-%{region}-uploads", you end up with something truly reusable across any region, environment and product.
I'm pretty sure there's no chance I'd be able to do this with Pulumi.
Making Terraform changes every six weeks was enough time that we forgot everything and had to refresh our memories. Every time it felt like going into the water in a northern beach and forgetting how goddamned cold the water was, then reproaching yourself for forgetting.
https://helm.sh/docs/chart_template_guide/control_structures...
You have YAML/JSON that k8s API wants, that is fed through helm which is fed through helmsman or whatever newer thing. There might be a layer or two of other templating around. Sometimes companies have built systems so developers/devops don't even have the ability to see what the final compiled version of the template is which is like the mother of all: "works on my laptop" problems.
It's super easy to break text based templating because of some space, tab, string escaping or whatever.
YAML makes it worse as there are lots of gotchas and different ways of doing. JSON, being quite verbose and inflexible at least has strong structure right in your face so it's a bit easier to figure out what went wrong.
With a proper programming language data structure you can be much better with verifying that the things you add or remove or iterate over will produce a valid result, much better refactoring and working as a team independently.
the complexity in one way or another must be preserved within the abstraction (in all likelihood) or you will have cases you cannot create in that layer or breakages which now have the total complexity of both the abstraction itself AND kubernetes itself required to fix.
i would not say IaC is going to provide you a magic solution to learning k8s, although the value in using IaC (e.g. Argo CD / Flux CD + Kustomize + ...) in K8s land is that you are no longer imperatively managing your cluster resources and therefore can keep them within a repository, managed like code. the point of the solution is not to make it easier for newcomers, but to make it easier to have teams manage and work together on an established cluster for deployments, ...
in the case of Pulumi, you leverage the single language with typechecking instead of relying upon K8s flavoured YAML, which is itself beneficial in many ways (since you can use your regular developer tooling)
wrt pkl, pretending K8s manifest structure underneath does not help because you will need to know how the keys within a manifest interact with the underlying system regardless, especially to understand functionality, e.g. node selectors, taints and tolerations, node affinity, ...
i prior managed a terraform-based deployment of several k8s clusters and it still required knowledge of those keys and values, alongside knowledge of the underlying resource types.
without those you can't implement things like GPU-based node selection for jobs which require a GPU, ...
Just use CloudFormation. Easy to write, declarative, vars (Parameters and Output exports). Trick is not to pile everything in one Stack. Use several.
And it generates shitty CFN, we can do better ourselves :)
It’s got everything you want:
- strong type system (TS),
- full expressive power of a real programming language (TS),
- can use every existing terraform provider directly,
- compiles to actual Terraform so you can always use that as an escape hatch to debug any problems or interface with any other tools,
- official backing of Hashicorp so it’s a safe bet
It’s a super power for infra. If you have strong software dev skills and you want to leverage the entire TF ecosystem without the pain of Terraform the language, CDKTF is for you.
(No affiliation)
But all in all, it works. It's just a bit limited on what you can do with the actual language.
I suppose TypeScript does count as a real programming language, in that it’s Turing complete. But I can use Pulumi from (they claim) any programming language. Specifically, I can use it from Go. Why would I add TypeScript to my project when I can live in one language?
> - official backing of Hashicorp so it’s a safe bet
Given the number of folks leaving the Hashicorp platform, I think it’s arguably no longer a ‘safe bet.’
It turns out terraform is actually quite acceptable when you slap a decent language on top of it. Passable, even :)
We've been migrating off of Terraform at BigCo recently and it has been a tremendous success. The migration has saved countless hours. Before, I was jaded and routinely in the office until 8 or 9 or so manually running terraform deploys for our engineering teams in India. Now, thanks to Pulumi, I'm able to leave the office at 7:30-8 -- and I can tell you single handed that this has saved my relationship with my daughter and maybe even my marriage. I'm running the fastest for loops thanks to Pulumi. We actually compile our Python down to c and use the Pulumi C SDK for insane speed benefits when we loop over our datacenter arrays. Turns out, not having bounds checks shaves off valuable time that I would otherwise be spending with my daughter. Routinely I'd be waking up screaming at 4 in the morning due to Terraform (or, what we would refer to as Tearaform because all of the infra engineers were constantly in tears). Now, I can sleep soundly until 5:30.
Pro vs pulumi: you get a declarative template to debug and review
Pro vs CDK: The declarative template is applied via APIs instead of CloudFormation. The CDK CloudFormation abstraction leaks like hell
I've heard it referred to it as an "optionally typed" or "gradually typed" system, which, having worked for years in Typescript and other languages like Rust and Kotlin, etc, I agree with.
All of CDK does things in cloudformation, which made the whole thing stillborn as far as I’m concerned.
The CDK team goes to some lengths to make it better, but it’s all lambda based kludges.
Just write CloudFormation directly. Once you get the hang of the declarative style and become aware of the small gotchas, it's pretty comfy.
Exactly this. And don't make huge templates, split stuff logically to several stacks and pass vars via export/importvalue.
This is one hyper annoying area.
It is possible to get around it, but it's ugly, drop to L1 and override logical id:
let vpc = new ec2.Vpc(this, 'vpc', { natGateways: 1 })
let cfnVpc = vpc.node.defaultChild as ec2.CfnVPC
cfnVpc.overrideLogicalId('MainVpc')
You have to do this literally for every resource that's refactored.For us, we run 2 stacks. One that basically cannot/should-not be deleted/refactored. VPC, RDS, critical S3 buckets - i.e. critical data.
The 2nd stack runs the software and all those resources can be destroyed, moved whatever w/o any data loss.
But circular dependencies can also lead to issues here where CDK will prevent you from deleting a resource used or referenced by a different stack.
The problem with upserting is that if the resource already exists, its existing attributes and behavior might be incompatible with the state you're declaring. And it's impossible to devise a general solution that safely transitions an arbitrary resource from state A to state A' in a way that is sure to honor your intent.
If you don't mind sharing, suppose (because it's what I was doing) I was trying to create personal dev, staging, and prod environments. I want the usual suspects: templated entries in route53, a load balancer, a database, some Fargate, etc.
What are you meant to do here? Thank you.
So dumb. Trying to move to SST for only that reason
but if you add cdk to the path, you can still deploy, its just that your cicd and deployment scripts are not all using bun anymore
You have to do a few adjustments which you can see here https://github.com/codetalkio/bun-issue-cdk-repro?tab=readme...
- Change app/cdk.json to use bun instead of ts-node
- Remove package-lock.json + existing node_modules and run bun install
- You can now use bun run cdk as normal
I don't care about powerful. That's the opposite of what I want. I could just use k8s if I cared about that.
Why is this better then Ansible + Docker Compose?
What it provides is a set of conventions based on what most web apps look like.
Eg. built-in proxy with automatic TLS and zero downtime deployments, first-class support for a DB and cache, encrypted secrets, etc.
It’s definitely not for every use case, but for your typical 3-tier monolith on a handful of servers I found it does the job well.
Give me a forum (even Discourse will do) , I'm tired of needing 3rd party spyware to interact with developers. That it is all closed off from search engines makes it even worse
We've gone through a lot of pain to get this blueprint working since our AWS costs were getting out of hand but we didn't want to part ways with CDK.
We've now got the same stack structure going with Pulumi and Digital ocean, having the same ease of development with at least 60% cost reduction.
It’s not a drop in replacement. It might be worth it depending on what you’re doing.
I’m sure there are lots of DO clients seeing the same things we did, but not realizing it.
We did see it (multiple DCs—we didn’t just not try to fix this before going to AWS) in multiple cases with tens of clients so if there’s good news it’s that if you can monitor like 100 clients distributed over a wide area and all of them behave as expected you may not be experiencing what we did. What we saw was closer to 5% with absurd slowness or frequently-dropped connections than to 0.01%.
And if you are just operating a website and sticking Cloudflare or whatever in front of DO anyway, this doesn’t matter. I expect that’s why it’s not a more widely-reported issue.
Anyone using CDK should switch to Pulumi though.
Using a complex programming language (C++ of the browser world) just for this has a big switching cost. Unless you're all in on TS. And/or have already built a huge complex IaC tower of babel where programming-in-the-large virtues justify it.
If I had to guess it's because
- more imperative background developers need to work with infrastructure and they bring over their mindset and ways of working
- infrastructure is more and more available through API's and it saves a lot of effort to dynamically iterate over cattle than declaratively deal with pets
- things like conditionals, loops and abstractions are very useful for a reason
- in essence the declarative tools are not flexible enough for many use cases or ways of working, using a programming language brings infinite flexibility
Personally I am more in the declarative camp and see the benefits of it, but there is certain amount of banging ones head against it's rigidity.
It is classic "every problem is a nail to the person with a hammer". Complex languages - by definition - can solve a wider variety of problems than a simple declarative language but - by definition - are less simple.
Complex languages for infra - IMO - are the wrong tool for the wrong job because of the wrong skills and the wrong person. The only reason why inefficiencies like this are ever allowed to happen is money.
"Why hire a dev and an ops when we can hire a single devops for fractionally less?" - some excited business person or some broken dev manager, probably.
(For bigger stuff apparently CF has some limits relating to resoures per single stack)
the property that equates to config files is "being static", which modern deployments are not.
We've also started switching our custom Docker compose + SSL GitHub Action deployments to use Kamal [1] to take advantage of its nicer remote monitoring features
Terraform or CDK I would want a simple shareable thing that did the boilerplate that I called with any variables I needed to change.
On EKS, you need to do the same version updates with the same amount of terror.
You do pay the extra for the further management to just run containers somewhere!
(you might want to say "every" instead of over, "is" instead of "ist")
on one hand, I can see how this is an unfalsifiable standard, on the other hand I can see the utility of solving a friction for people that messed up
The alternative, which I feel is far too common (and I say this as someone who directly benefits from it): You choose AWS because it's a "Safe" choice and your incubator gets you a bunch of free credits for a year or two. You pay nothing for compute for the first year, but instead pay a devops guy a bunch to do all the setup - In the end it's about a wash because you have to pay a devops guy to handle your CI and deploy anyway, you're just paying a little more in the latter.
I won't touch DO after they took my droplet offline for 3 hours because I got DDoS'd by someone that was upset that I banned them from an IRC channel for spamming N-bombs and other racial slurs.
And can you name a real cloud that charges a half-reasonable price for bandwidth? I consider $10/TB to be half-reasonable.