oh we use cloud gpus, infiniband h100s absolutely aren't something we want to self-host. not aws tho, they're crazy overpriced; mithril and sfcompute!
we also use cloudflare extensively for everything that isn't the core heap dataset, the convenience of buckets is totally worth it for most day-to-day usage.
the heap is really just the main pretraining corpus and nothing else.