"For example, we recently internally benchmarked an Inferentia instance (inf1.2xlarge) against a GPU instance with an almost identical spot price (g4dn.xlarge) and found that, when serving the same ResNet50 model on Cortex, the Inferentia instance offered a more than 4x speedup."
https://towardsdatascience.com/why-every-company-will-have-m...