I haven’t shared your experience, but I read OPs book on Terraform cover-to-cover before and trying to work with the system.
- Bad UX
- Tool does not have interactive mode to provide suggestions or simple solutions to common problems
- Lack of options or commands for commonly-used tasks, like refactoring resources, modules, sub-modules, etc. (Using 'state mv' and 'state rm', etc is left as an exercise for the user to figure out and takes forever)
- Complains about "extra variables" found in tfvars files, making it annoying to re-use configuration, even though having "extra variables" poses no risk to operation
- (NEW) Shows you what has changed in the plan output, followed by what will *actually be changed* if applied, though both look the same, so you get confused and think the first part matters, but actually it's irrelevant.
- Bad internal design
- HCL has a wealth of functions yet is too restrictive in how you can use them. You will spend an entire day (or two) tying your brain into knots trying to figure out how to construct the logic needed to append to an array in a map element in an array in a for_each for a module (which was impossible a few years ago).
- Providers are inconsistent and often not written well, such as not providing useful error messages or context.
- Common lifecycle policy conventions per-resource-type have to be discovered by trial-and-error (rather than being the default or hinted) or you will end up bricking your gear after it's already deployed.
- The tool depends on both local state and optionally remote state. Local state litters module directories even though nearly everyone who uses it at scale uses modules as libraries/applications, not the location they execute the tool from. Several different wrappers were invented and default to changing this behavior because it has been a problem for years.
- Default actions and best practices (such as requiring a plan file before apply or destroy, automatically running init before get before validate, etc) are left to the user to figure out rather than done for them (again, wrappers had to solve this).
- Some actively dangerous things are the default, like overwriting backup state files (if they're created by default).
- Version management of state is left up to the user (or remote backend provider)
- Not designed for DRY code or configuration; multiple wrappers had to implement this
- You can't specify backend configuration via the -var-files option, and backend configuration can't be JSON ... why? They just felt like making it annoying. Some "philosophical" development choice that users hate and makes the tool harder to use.
- Workspaces are an anti-pattern; you end up not using them at scale.
- You can't use count or for_each for provider sections, so if you wanted a configurable number of providers (say with different credentials each), tough luck. ("We're Opinionated!")
- Can't use variables in a backend block. ("We're Opinionated!")
- Can't have more than one backend per module. ("We're Opinionated!")
- Lots of persistent bad behavior has been fixed in recent releases, like not pushing state changes as resources are applied, others I can't remember.
- Global lock on state, because again, ya can't have more than one backend block per module.
- All secrets are stored as plaintext in the state file, so either you don't manage secrets *at all* with Terraform, or you admit that your Terraform state is highly sensitive and needs to be segregated from everyone/everything and nobody can be given access to it.
- No automatic detection of, or import of, existing resources. It knows they're there, because it fails to create them (and doesn't get a permission error back from the API), but it refuses to then give you the option of importing them. The *terraformer* project had to be invented just to get a semblance of auto-import, when they could have just added 100 lines of code to Terraform and saved everyone years of work.
- Not letting people write modules, logic, providers, etc in an arbitrary executable. Other tools do this so you can ramp up on new solutions quickly and make turn-key solutions to common needs, but Terraform doesn't allow this; write it in Go or HCL or get bent.
- You have to explicitly pass variable inputs to module blocks, so you can't just implicitly detect a variable that has already been passed to the tool. But this isn't the case if you're applying a module; only if you create a sub-module block. This just makes initial development and refactoring take more time without giving the user an added benefit.
- You have to explicitly define variables, rather than just inherit them as passed to the tool at runtime. Mind you, you don't have to actually include the variable type; you just have to declare *the name* of the variable. So again, it wastes the user's time when trying to develop or refactor, for absolutely no benefit at all.
- You have to bootstrap the initial remote backend state resources *outside* of Terraform, or, do it with local state, and then migrate the state after adding new resources or using a separate identical module that has a backend configuration. Does that sound complicated? It is, and annoying, and unnecessary.
- You have to be careful not to make your module too big, because modules that manage too many resources take too long to plan and apply and risk dying before completing. (If you're managing resources in China, make the module even smaller, because timeouts over the great firewall are so common that it's nearly impossible to finish applying in a reasonable time)
- Tests. In Go.
- Schema for your tfvars files? Nope; write some really complicated logic in a variable to validate each variable in a different way.
- Providers don't document the restrictions on things like naming convention for required parameters, so you have to apply to the API and then get back a weird error and go try to dig up some docs that hopefully tell you the naming convention so you can fix it and try again.
- Terraform *plan* will give you 'known after apply' for values it very easily could tell you *before* the apply, but for whatever reason doesn't. You never really know what it's going to do until you do it and it blows up production.
- It's very difficult (sometimes near impossible) to just absorb the current state of the infrastructure into TF (as in, "it's working right now, please just keep it the way it is"). Import only works if you've already written the HCL for the resources, and then look up how the provider wants to you to import that resource.
- Version pinning is handled like 5 different ways, but is still impossible to pin and use different sets of versions when applying different state files for the same HCL module code and values.
That's off the top of my head, there's much more.