People started using K8s for training, where you already had a network isolated cluster. Extending the K8s+container pattern to multi-tenant environments is scary at best.
I didn't understand the following part though.
> Instead, we burned months trying (and ultimately failing) to get Nvidia’s host drivers working to map virtualized GPUs into Intel Cloud Hypervisor.
Why was this part so hard? Doing PCI passthrough with the Cloud Hypervisor (CH) is relatively common. Was it the transition from Firecracker to CH that was tricky?