modal.com’s modal.Sandbox can be the compute layer for this. It uses Gvisor under the hood.
This could be useful to build a self-hosted "Computer use" using Ollama and a multimodal model.