Yeah this isn't very easy to get right at the moment so there is not going to be any silver bullet here. We had to iterate on our runner a lot to get this right, but we have a lot of experience since we do this for Terraform Cloud too.
Answering your questions:
> * I assume you aren't shelling out :). Do you have any additional helper libraries on top of the Terraform code base to make it more of a a programmatically consumable API, as apposed to an end user application?
We in fact are. There are lots of security concerns you have to consider with this. We published a library to make this easier: https://github.com/hashicorp/terraform-exec
> * Are you still pointing at a directory with resources defined in HCL, or are the resources defined programmatically?
HCL mixed with the JSON flavor of HCL for programmatically generated stuff. Variables in JSON format also programmatically generated.
> * What are you using for state storage?
We output it to a file and handle this in an HCP microservice. We encrypt it using the customer-specific key with Vault and store it in a bucket that only the customer-specific credential has access to. If there is an RCE exploit somehow in our workflows, they can only access that customer's metadata.
> * What is the execution environment for the programmatic Terraform process? Since Terraform uses external processes for plugins, I've hit some issues with resource constraints around the max number of process sysctl's in containerized environment where I have multiple Terraform processes running in the same container.
Containers in HCP and VMs in Terraform Cloud due to increased isolation requirements. HCP has less strict requirements because the Terraform configs and inputs are more tightly controlled.