But... It isn't used much outside MLC? And MLC's implementations are basically demos.
I dunno why. AI inference communities are dying for fast multiplatform backends without the fuss of PyTorch.
As a random aside, I hope y'all publish a SDXL repo for local (non webgpu) inference. SDXL is too compute heavy to split/offload to cpu like Llama.cpp, but less ram heavy than llms, and I'm thinking it would benefit from TVM's "easy" quantization.
It would be a great backend to hook into the various web UIs, maybe with the secondary model loaded on an IGP.
https://github.com/merrymercy/awesome-tensor-compilers#open-...