This seems unlikely to me, but what is the truth?
I understand that _training_ an LLM is very very expensive. (Although so is spinning up a fab for a new CPU.) But it seems to me the incremental costs to query a model should be relatively low.
I'd love to see your back-of-the-envelope calculations for how much water and especially how much electricity it takes to "answer a single query" from, say, ChatGPT, Claude-3.7-Sonnet or Gemini Flash. Bonus points if you compare it to watching five minutes of a YouTube video or doing a Google search.
Links to sources would also be appreciated.