Meta is open sourcing AITemplate, an inference engine for both Nvidia and AMD GPUs. Code: https://github.com/facebookincubator/AITemplate.
AITemplate delivers much better perf (1.9x ~ 12.8x) compared to PyTorch eager on SOTA models, including Bert, ResNet, VIT and StableDiffusion.
AITemplate also delivers high perf numbers using AMD GPUs (MI-250). With AITemplate, MI-250 achieves 80% ~ 96% A100 perf on various ResNet / Bert / VIT models.
AITemplate uses sophisticated fusion techniques to optimize perf, including vertical, horizontal, and memory fusions.
btw, I'm one of the authors of AITemplate, happy to answer any questions.
Edit: link for TVM https://tvm.apache.org/
We don't have an official comparison between AITemplate and tvm / onnx for now, but we do have perf numbers like https://github.com/facebookincubator/AITemplate/tree/main/ex..., https://github.com/facebookincubator/AITemplate/tree/main/ex.... Feel free to run these examples on other frameworks and compare perf.
More benchmark numbers and repro at: https://github.com/facebookincubator/AITemplate/tree/main/ex...
One or two more optimizations and we're gonna have live-update results.
Would AITemplate be able to run with those constraints?
Thank you so much for your post! I would be very grateful for the response!
P.S. Though it should be 1.4 seconds. 0.7*2=14.If you think twice the speps, twice the time.
Maybe this is to attract better engineers but all in all this has been a net postive for software development. So credit where it is due.
Of course I would argue there's a better way to provide these kinds of services that concentrates power less, and that's decentralization with cryptoeconomic incentives to maintain consensus, but for their generation, they did well.