It's funny because I believe they used to have a per seat licensing and then people complained that they were paying too much for large orgs with a few resources.
Anyways, nowadays I just use aws-cdk. I like the freedom of scripts and the extrme power that comes with them. It is vendor specific, but nothing is really cloud-agnostic, there's always some requirement to have some understanding of the platform's terminology and nuances so this point is pretty moot. It also requires a little bit more planning than just writing something delcarative and letting something figure out the incremental changes, but I like it because it lets me better understand what's happening without runaway complexity.
That limitation makes Pulumi a nonstarter for me. Even if I am paying for it, this is super sensitive data and I want control over how it's stored. I'm sure Pulumi cloud is excellent, but I hate it's the only real option.
The AI assistant seen here is a very good idea. I will actually share this with my team.
I imagine the terraform conversion doesn't work when you use in-house written providers (in golang), is this correct?
If you need help with this, please reach out to me on the Pulumi community slack at https://slack.pulumi.com and happy to help
I recommend checking out Spacelift[0]. It's a CI/CD system specialized for Infra as Code, including Terraform and Pulumi. We have both a ton of additional features, as well as a much more reasonable pricing model, so we've recently been talking a lot to folks wanting to migrate from TFC due to the changes you've mentioned.
You can easily reach out to us on the website by scheduling a demo, using the chat widget, or just sending me an email (see my profile).
[0]: https://spacelift.io
Disclaimer: Software Engineering Team Lead at Spacelift, so take this with a healthy grain of salt, but I do legitimately think it's a great product.
Bit of a catch-22 to setup managed postgres for state, tailscale network with a vm as exit-node (to access postgres via VPN/lan - not Internet) - via Terraform - and then switch state from local (initial setup) to postgres.
But nice to be able to eg: interact with a different provider for DNS, supplying IPs from other resources.
If however a dedicated devops team has to manage it eventually, and there are multiple dev teams who all use different languages, then Terraform is better, since it’s practically impossible to learn all these languages. Terraform will give you a dumbed down enough lingua franca.
Also Terraform gives you the advantage to keep code complexity to a minimum. No chance to find some “clever” self referencing function or other language specific crazy stuff to solve a problem in it, it forces you to keep it simple.
How is this a disadvantage?
If you can mandate Terraform and HCL across the org you can mandate Pulumi and a language eg Typescript. What’s the difference?
You can also use Terraform with a programming language ie CDKTF so the Terraform choice doesn’t make it simpler.
CDKTF is totally optional and besides the point. Could as well say you can use Pulumi with Makefiles so the Pulumi choice doesn’t make it simpler.
Terraform can’t replace it well. Maybe facilitate the transition to more cloud native setups where you don’t need Ansible at all. But only within limits. If you’re asked to implement something there’s no cloud offering for, and it’s slightly more complex, you’re back to ansible.
Or something like that. My CS-fu is admittedly weak.
OTOH, I think Lisp would be the perfect choice for IaC.
I think reinventing the wheel every time a for loop is needed in a resource spec makes things unnecessarily tedious for everyone.
You always need loops, control structures, etc. and this is automatically a feature in Pulumi as it is in the language you use it with.
Terraform users hack it all the time (think of the "count" as if/else, for instance) precisely because it's never a purely a declarative config what you need, and the level of dynamism you need is not been semantically supported by Terraform.
So, we are stuck with hacks...
I've used Terraform and Pulumi in the past and the "platform agnostic" is only true for trivial builds. Once you get into medium sized infra you're writing so much AWS specific code it stands better to go 1st party.
I can't speak for GCP, but this is what happened to us. We kept fighting 3rd party code we ended up going CDK. While there are still issues, there were less issues. Cloudformation manages the state under the hood anyway so we're all just stuck wrapping that sucker, even in the CDK.
- random provider to generate a db password, cloud provider to provision db and admin user, random and mysql providers to provision additional non-admin users, k8s provider to upload credentials to secrets
- tls provider to create ca, k8s provider to create namespace, create certs for each k8s namespace, upload to k8s secrets
- Cloud provider to issue service account key, GitHub provider to upload to GHA (don't do this anymore since oidc is supported, before it was rather important)
While I haven't used CDK much, I believe it is still basically about provisioning aws resources and would not have such cross cutting configs, though I think they may have had a mechanism for importing tf providers.
Terraform can have its issues but overall being able to provision such diverse resources with a single command has been great for onboarding/reducing human error. I'm sure there are plenty of cases where CDK is easier too, these are just to demonstrate why you may use TF even when locked in.
If you solely work with one cloud and don't intend on switching cloud providers or working in a context where multiple cloud providers are used, then by all means specialise. Personally I'm working as a software engineer, so I don't really care in which cloud it runs, as long as I can provision whatever resources I need.
When trying to achieve a cloud agnostic approach, there are some other tools focused on that (such as serverless framework).
There are a number of other subtler topics as well. In my experience, Terraform tends to be noticeably quicker to run (when the limiting factor is not some AWS service that's spinning up for ages - but usually that is not the case after the first run). And just because you're mostly vendor locked, doesn't mean you don't have a few 3rd party vendors to also integrate; it's nice to have a wider support for these providers. While the CDK's types are more consistent than the Terraform ecosystem and generally lead to faster prototyping, the type system itself only protects you from a small subset of errors. You can still wire things together that will never work, and you'll only find out at deploy time. Lastly, I've often found the CDK's versioning to be fairly odd. I get the idea behind stable and experimental packages, and the need to be explicit about these. However, it seems odd that certain packages spend _years_ in experimental, despite being absolutely mainstream from the very beginning (I am looking at you, API Gateway V2 - though it's been a few months since I last checked).
(OK, so this was a bit more tongue-in-cheek than I normally go for. I'm somewhat frustrated by CDK because it has so much going for it, that when it disappoints me, it really hurts.)
This is very true. We didn't get far enough to get into a drift issue to compare CDK vs 3rd party, so this is new to me, thanks for that. Admittedly, we solved this by saying no one is allowed to click buttons at all on the console for certain envs.
What's worked well for us is we multiple sub-accounts which handle dev(s), stage, uat, prod. Dev(s) accounts are pure temp, use the CDK to stand it up, then click all you want while you "figure it out".
Stage, UAT, and Prod are all code only.
We also separate Network vs Application stacks, but it's more actually delete-able vs critical. Dangerous to delete/mess with - Route53, RDS, VPC, S3 (some). Delete-able is Lambda, EC2, ElastiCache, etc. If we lost these, we don't "lose" anything other than downtime no backups/cust data gone and restoring it is trivial.
Also agreed on the experimental packages or needing to use the escape hatches. We have L1, L2 mixed all over the place. We use almost no L3 because it's like Hello World - great for demos but not really practical and it's easier to customize the L2's because getting access to individual resources in L3 is janky (but we do do it for a few L2's).
After having tried Pulumi I felt it does not bring the claimed pros, you need to write code in a non-idiomatic way, and coming up with abstractions for the representation of the stuff you talk about takes more time than just write proper terraform, and perform proper thorough reviews, which also facilitate knowledge sharing in the organization.
Also then you'd have 3 layers as sources of bugs, instead of just one: your code (you tested it properly, didn't you! How much time saved as opposed to just doing proper reviews!), pulumi itself (has faced its bugs on such a simple task as setting up an aws lambda), and terraform it uses under the hood.
In my experience pulumi doesn't solve problems for the business, but lets engineers waste time and money on playing around with their NIH abstractions. We tried it, and dumped it, and went back to terraform, which just works.
In HCL it’s difficult to break repeatability, for example by querying external sources in an uncontrolled way.
We're coming up on 10000 resources in our main Terraform repository and while there is definitely some friction, it's overall much better than having to hit the cloud API's to gather each of those states which would probably take at least an order of magnitude longer.
We also just recently started setting up a periodic drift detection build to help identify and address drift.
Could you elaborate on the circumstances that have caused your state sync issues?
I don't think that's necessary true. Most cloud API's actually can return hundreds of records with 1 API calls, e.g. https://docs.aws.amazon.com/elasticloadbalancing/latest/APIR... has a maximum page size of 400.
If I manage the cloud resources via some custom tools and/or with some ansible-fu, I can decide to batch the API calls when it makes sense.
With terraform, it is not possible to do so (https://github.com/hashicorp/terraform-plugin-sdk/issues/66, https://github.com/hashicorp/terraform-provider-aws/issues/2...).
I'm not really sure how you'd do it without a state file though? State drift is very common and without an idea of the current state it's pretty difficult for any framework to work out what updates need to be applied.
State is a problem and have had just the same kind of hardships when had to manually modify it on some changes as I would have had on terraform. (transfer resources between state stores as a business function is handed over within company, with zero downtime, import docs are not always correct/up2date just like in case of terraform, as if the example codes were generated by chatgpt)
Overall pulumi couldn't convince me, it wasn't a bit more convenient than terraform, yet I could bump into its bugs for pretty basic aws features (around lambdas/apigw, but was more than a year ago, might be fixed already) which did work just fine with terraform ootb, and the pulumi code felt bad to look at despite being in a language I generally like. It did not bring the effect management wanted that developers with no terraform knowledge can be onboarded to infra, as of course infra knowledge is needed and is harder to pick up than learning terraform syntax.
Despite my dislike for terraform and its syntax's limitations in expressiveness, I'm still back to TF, and is more productive than pulumi was.
I got myself into a situation recently that I could only get out of by using the CLI interactively. I wound up with multiple copies of each resource, which shared a URN, so when I tried to delete them from the CLI it would always prompt me for which instance of that URN to delete. I ended up spending much of a day writing a program to call their CLI and then interact with it programmatically because I had hundreds of resources to delete.
Since then I’ve been doing more with CloudFormation directly and am tempted to switch.
Also: don't use static AWS creds, kids
It depends.
The AWS "Classic" provider uses the terraform provider [1].
The AWS "Native" provider does not, and instead uses the AWS Cloud Control API [2].
No everything is available vs TF HCL, it uses CDK from AWS CDK and tries to bridge things to TF have codegen. There’s a lot of weird corner cases and inconveniences. It even uses React under the hood.
Pulumi at least takes care of it better. Maybe not perfectly but better.