undefined | Better HN

0 pointsmlyle2y ago0 comments

Sure it does:

- Most apps are not non-stop token generation for concurrent users-- ChatGPT's duty cycle at this is very low.

- A 4090 amortized over 4 years, working days & hours, is 20 cents per working hour; this is basically the same as the power going into it. It's less than a penny per hour per concurrent on a task like this.

- Hopefully you're using LLM to deliver value that's worth more than a penny per hour of the people using it.

- If you hit massive scale and want to buy A100s to improve the economics because you're drowning in business, you can go ahead and readily do that at that time...

0 comments

littlestymaar2y ago

> A 4090 amortized over 4 years, working days & hours, is 20 cents per working hour;

But that's not how it works: you need to have enough of it to accommodate for peek usage, but a good fraction of that isn't going to be running most of the time. You'd end up with a cost that's not too far from what Cloud providers are offering, which is a roughly 3 times that price. And you need to pay for the whole server hosting these GPUs (this less of a factor when you're using big GPUs like H100, but if you want to stick with consumer-grade GPUs, then the host is still a non-trivial fraction of the cost, and your supporting a server for a small bunch of concurrent users, which means your infra team is going to work with a massive pool of servers very quickly, with all the associated costs).

> It's less than a penny per hour per concurrent on a task like this.

It's still two orders of magnitude more expansive than any other SaaS business.

> Hopefully you're using LLM to deliver value that's worth more than a penny per hour of the people using it.

Maybe, but then again you're trying to build a service that has to add much more value than what the typical SaaS start-up provide.

Also regarding this:

> - Most apps are not non-stop token generation for concurrent users-- ChatGPT's duty cycle at this is very low.

ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:

- this market is already taken by them, so your startup isn't gonna do the same.

- when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).

mlyleOP2y ago

> It's still two orders of magnitude more expansive than any other SaaS business.

Really? Some SaaS businesses have users doing things that generate tens of thousands of IOs per user request across spinning storage, or even far more.

> ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:

I think you basically completely misunderstood everything I said. Here, the point was that someone using it is generating tokens a very large proportion of the time they're sitting in front of the service compared to most use cases-- but it's still only like 20% of the time.

We all have a pretty good understanding of the tradeoffs between owning hardware vs. elastic usage of a utility. We know that "peek usage" [sic] is higher than average (which is why there's a duty cycle correction in the calculation in the first place).

> - when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).

It all depends. The system I just built and deployed does not need to be immediately responsive to end-users (users can tolerate a delay of a couple of minutes), with a few thousand tokens per user per week, and usage smeared pretty well over a several hour per day window. There's a lot of reasons (beyond economics) why moving it to a consumer GPU is attractive, but it won't be happy with a 1B parameter model.

littlestymaar2y ago

> "peek usage" [sic]

You are very smart indeed…

1 more reply

j / k navigate · click thread line to collapse

0 comments

littlestymaar2y ago

> A 4090 amortized over 4 years, working days & hours, is 20 cents per working hour;

> It's less than a penny per hour per concurrent on a task like this.

It's still two orders of magnitude more expansive than any other SaaS business.

> Hopefully you're using LLM to deliver value that's worth more than a penny per hour of the people using it.

Maybe, but then again you're trying to build a service that has to add much more value than what the typical SaaS start-up provide.

Also regarding this:

> - Most apps are not non-stop token generation for concurrent users-- ChatGPT's duty cycle at this is very low.

ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:

- this market is already taken by them, so your startup isn't gonna do the same.

- when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).

mlyleOP2y ago

> It's still two orders of magnitude more expansive than any other SaaS business.

Really? Some SaaS businesses have users doing things that generate tens of thousands of IOs per user request across spinning storage, or even far more.

> ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:

> - when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).

littlestymaar2y ago

> "peek usage" [sic]

You are very smart indeed…

1 more reply

j / k navigate · click thread line to collapse