undefined | Better HN

0 pointspopcorncowboy1y ago0 comments

> Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o

Does anyone have details on exactly what this means or where/how this metric gets derived?

0 comments

I am guessing these are prices on services like AWS Bedrock (their post is down right now).

a big chunk of that is probably the fact that you don't need to pay someone who is trying to make a profit by running inference off-premises.

j / k navigate · click thread line to collapse

0 pointspopcorncowboy1y ago0 comments

> Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o

Does anyone have details on exactly what this means or where/how this metric gets derived?

I am guessing these are prices on services like AWS Bedrock (their post is down right now).

a big chunk of that is probably the fact that you don't need to pay someone who is trying to make a profit by running inference off-premises.

j / k navigate · click thread line to collapse