Data center energy use isn't simple to calculate because servers are configured to process a lot of requests in parallel. You're not getting an entire GPU cluster to yourself while your request is being processed. Your tokens are being processed in parallel with a lot of other people's requests for efficiency.
This is why some providers can offer a fast mode: Your request gets routed to servers that are tuned to process fewer requests in parallel for a moderate speedup. They charge you more for it because they can't fit as many requests into that server.