That's fascinating to hear and I think it would work really well with what we do.
What I am picturing is that you could run the whole workflow including traditional heuristics in a CPU instance, which would connect to a GPU on-demand.
If you are interested would love for you to try this. We're running a (very unprofitable) beta with a T4 instance + a CPU-only instance for $10/month for those who are willing to help us test this with production workloads. If you'd be interested would love to chat at carl (at) thundercompute (dot) com.