That reminds me when cloudflare launched their workers gpu product, it was specifically aimed at running models and the pricing was abstracted and based on model output. Did you look what they were doing when building gpu machines?
https://blog.cloudflare.com/workers-ai/