We're Cutting L40S Prices in Half (opens in new tab)

(fly.io)

59 pointsLukeLambert1y ago30 comments

30 comments

L40S has 48GB of RAM, curious how they're able to run Llama 3.1 70B on it. The weights alone would exceed this. Maybe they mean quantized/fp8?

I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs.

I was initially running my inference servers on fly.io because they were so easy to get started with. But I eventually moved elsewhere because the prices were so high. I pointed out to someone there that e-mailed me that it was really expensive vs. others and they basically just waved me away.

For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular).

tptacek1y ago

Our standard A100 SXM 80GB price is $3.50/hr, for what it's worth.

Palmik1y ago

For a reference, that's at least 40% more than what H100 sxm would cost if you are willing to reserve for a month (so not apples to apples).

H100 will also be much faster, especially if you are willing to use fp8. Maybe 3-4x

nknealk1y ago

> You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU.

Savage.

I wonder if we’ll see a resurgence of cloud game streaming

synicalx1y ago

I feel like Geforce Now has been growing quite steadily, they onboard new titles every week including some big ones like World of Warcraft recently and they've stood up streaming DC's in quite a lot of places now.

duxup1y ago

GeForce Now is pretty great.

mock-possum1y ago

Is it responsive enough to play action games? Noticeable lag is what kept me off of Stadia.

1 more reply

0max1y ago

Is the services that PlayStation Now uses publicly known? That's the only streaming service I've used so far.

synicalx1y ago

As someone who used to work on that platform back when it was still called PlayStation Now, no it is not. There's a few rumors and alleged leaks though.

doublepg231y ago

MS seems to be continuing to push xCloud with Games Pass. It has been useful when playing games with my friends a few times.

deepsquirrelnet1y ago

I hadn’t even heard of L40S until I started renting to get more memory for small training jobs. I didn’t benchmark it, but it seemed to be pretty fast for a pcie card.

Amazon’s g6 instances are L4-based with 24gb vram, half the capacity of the L40S, with sagemaker in demand prices at this rate. Vast ai is cheaper, though a little more like bidding and varying in availability.

CGamesPlay1y ago

> You can run Llama 3.1 70B — the big Llama — for LLM jobs.

That's the medium Llama. Does anyone know if an L40S would run the 405B version?

xena1y ago

Hi, I'm the person that wrote that sizing comment in the draft for this article. I have been trying for a while and have been unsuccessful at getting 405B running on any of the GPU machines. I suspect I'd need a raw 8xA100 node to do it at Q4. I doubt there is any reasonable combination of L40s cards that can do it on fly.io. It's just too big. I suspect that in time the 70b model will be brought up to be roughly equivalent, but realistically it's already on the GPT-4 threshold as is. I've found that 70b is more than sufficient in practice.

DavidSJ1y ago

Be that as it may, Llama 3.1 70B is not the big Llama.

1 more reply

tazu1y ago

Prices lowered to $1.25/hr... still 2X vast.ai prices.

tptacek1y ago

There are definitely GPU providers where you can buy cheaper L40S hours than us. I'm not entirely sure what their system architectures are, or whether they're just buying in absolutely spectacular volume, because we are cutting pretty close to the bone with our pricing.

One cost factor we have that other providers might not have (I'd love to know): we have to dedicate individual racked physical hosts to each group of GPUs we deploy, because we don't (/can't, depending on how you think about systems security) allow GPU-enabled workloads to share hardware with non-GPU-enabled workloads, and we don't allow anyone to share kernels.

But like we said in the post: we're still figuring this stuff out. What we know is: at the same price level, we're consistently sold out of A10 inventory.

dawnerd1y ago

Hadn't heard of vast.ai before and looked into it. The prices seem really good. Then saw "Our software allows anyone to easily become a host by renting out their hardware."

Ya, that's a no from me.

1 more reply

indigodaddy1y ago

Also, vast.ai and fly.io just in general are not apples to apples. Sure, go to vast, get yourself a vm or vps or instance or docker container or whatever instance they are giving you. Do your stuffs. Sure. But that is not even close to the same set of features/infra/platform that fly.io offers is it? I'm not sure why people keep thinking that gpu pricing on fly should be the same as an instance on some generic GPU farm or with vast you could even be getting a slice on some random gamer dude's actual computer. Am I not wrong here?

rdedev1y ago

I don't know what platform vast.ai uses but what I have noticed is cpu compute is pretty slow in those. Specifically the tokenization stage was unusually slow for no apparent reason. Had to give that up and use Google cloud for my research project

mrkurt1y ago

Sometimes vast.ai is running GPUs on Fly.io that people with YC credits have spun up and added to their marketplace. Those would have been fast though.

They run on literally anything someone installs their agent on.

layoric1y ago

Not as fast as the L40S, but Runpod.io has the A40 48gb for $0.28/hr spot price, so if its mainly VRAM you need, this is a lot cheaper option. Vast.ai has it for the same price as well.

tptacek1y ago

Runpod is definitely cheaper than we are! We are not the cheapest GPU/hour you can get on any hardware iteration. That's not what we're about, and it is 100% legit to point out that there are workloads that make more sense on other platforms. It would be very weird if that wasn't the case.

blindriver1y ago

Suddenly cutting prices in half shows that the business model is in dire straits.

tptacek1y ago

What it shows is that we're sold out of one part but not the next part up. We're not cutting all our prices in half. We'd just rather source more L40S's than A10's, for what I think are pretty obvious reasons.

This all happened because we were having internal meetings about trying to find A10s to rack, and Kurt stopped and said "wtf are we doing".

If it'll make you feel better, we'll continue to charge you the previous list price for L40S GPU hours.

gedw991y ago

they buy them at 12 K, so they pay them off in 1 year approx

nice business to be in I guess.

tptacek1y ago

LOL.

j / k navigate · click thread line to collapse

30 comments

zackangelo1y ago

L40S has 48GB of RAM, curious how they're able to run Llama 3.1 70B on it. The weights alone would exceed this. Maybe they mean quantized/fp8?

I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs.

For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular).

tptacek1y ago

Our standard A100 SXM 80GB price is $3.50/hr, for what it's worth.

Palmik1y ago

For a reference, that's at least 40% more than what H100 sxm would cost if you are willing to reserve for a month (so not apples to apples).

H100 will also be much faster, especially if you are willing to use fp8. Maybe 3-4x

nknealk1y ago

> You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU.

Savage.

I wonder if we’ll see a resurgence of cloud game streaming

synicalx1y ago

duxup1y ago

GeForce Now is pretty great.

mock-possum1y ago

Is it responsive enough to play action games? Noticeable lag is what kept me off of Stadia.

1 more reply

0max1y ago

Is the services that PlayStation Now uses publicly known? That's the only streaming service I've used so far.

synicalx1y ago

As someone who used to work on that platform back when it was still called PlayStation Now, no it is not. There's a few rumors and alleged leaks though.

doublepg231y ago

MS seems to be continuing to push xCloud with Games Pass. It has been useful when playing games with my friends a few times.

deepsquirrelnet1y ago

I hadn’t even heard of L40S until I started renting to get more memory for small training jobs. I didn’t benchmark it, but it seemed to be pretty fast for a pcie card.

CGamesPlay1y ago

> You can run Llama 3.1 70B — the big Llama — for LLM jobs.

That's the medium Llama. Does anyone know if an L40S would run the 405B version?

xena1y ago

DavidSJ1y ago

Be that as it may, Llama 3.1 70B is not the big Llama.

1 more reply

tazu1y ago

Prices lowered to $1.25/hr... still 2X vast.ai prices.

tptacek1y ago

But like we said in the post: we're still figuring this stuff out. What we know is: at the same price level, we're consistently sold out of A10 inventory.

dawnerd1y ago

Hadn't heard of vast.ai before and looked into it. The prices seem really good. Then saw "Our software allows anyone to easily become a host by renting out their hardware."

Ya, that's a no from me.

1 more reply

indigodaddy1y ago

rdedev1y ago

mrkurt1y ago

Sometimes vast.ai is running GPUs on Fly.io that people with YC credits have spun up and added to their marketplace. Those would have been fast though.

They run on literally anything someone installs their agent on.

layoric1y ago

Not as fast as the L40S, but Runpod.io has the A40 48gb for $0.28/hr spot price, so if its mainly VRAM you need, this is a lot cheaper option. Vast.ai has it for the same price as well.

tptacek1y ago

blindriver1y ago

Suddenly cutting prices in half shows that the business model is in dire straits.

tptacek1y ago

This all happened because we were having internal meetings about trying to find A10s to rack, and Kurt stopped and said "wtf are we doing".

If it'll make you feel better, we'll continue to charge you the previous list price for L40S GPU hours.

gedw991y ago

they buy them at 12 K, so they pay them off in 1 year approx

nice business to be in I guess.

tptacek1y ago

LOL.

j / k navigate · click thread line to collapse