Sure it does:
- Most apps are not non-stop token generation for concurrent users-- ChatGPT's duty cycle at this is very low.
- A 4090 amortized over 4 years, working days & hours, is 20 cents per working hour; this is basically the same as the power going into it. It's less than a penny per hour per concurrent on a task like this.
- Hopefully you're using LLM to deliver value that's worth more than a penny per hour of the people using it.
- If you hit massive scale and want to buy A100s to improve the economics because you're drowning in business, you can go ahead and readily do that at that time...