We're using GPUs already on AWS for inference. The big problem is we have a worker pool of GPU servers and a task queue, leading to some latency if we don't have enough servers running and by the time the EC2 auto scaling group adds one the queue could be drained. Also we are seeing extra latency from requests originating from Asia since we currently have our servers located in a US region.
So basically all the classic benefits of serverless and edge computing would apply, the difference being we would need some more drivers, python libraries and a GPU attached.