> A 4090 amortized over 4 years, working days & hours, is 20 cents per working hour;
But that's not how it works: you need to have enough of it to accommodate for peek usage, but a good fraction of that isn't going to be running most of the time. You'd end up with a cost that's not too far from what Cloud providers are offering, which is a roughly 3 times that price. And you need to pay for the whole server hosting these GPUs (this less of a factor when you're using big GPUs like H100, but if you want to stick with consumer-grade GPUs, then the host is still a non-trivial fraction of the cost, and your supporting a server for a small bunch of concurrent users, which means your infra team is going to work with a massive pool of servers very quickly, with all the associated costs).
> It's less than a penny per hour per concurrent on a task like this.
It's still two orders of magnitude more expansive than any other SaaS business.
> Hopefully you're using LLM to deliver value that's worth more than a penny per hour of the people using it.
Maybe, but then again you're trying to build a service that has to add much more value than what the typical SaaS start-up provide.
Also regarding this:
> - Most apps are not non-stop token generation for concurrent users-- ChatGPT's duty cycle at this is very low.
ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:
- this market is already taken by them, so your startup isn't gonna do the same.
- when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).