Groq CEO: 'We No Longer Sell Hardware' (opens in new tab)

> If you have to sell the hardware and support people buying it, that is a world of trouble.

What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription?

alephnerd2y ago

> What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription

Margins.

Pricing for cloud compute is much higher and servicing and management for the provider is much cheaper.

If I sold hardware directly, then I'm often on the hook for support contracts which can get pricy with hardware and distract from shipping future facing product features, as customers who purchase directly have longer upgrade windows due to logistical overhead.

themoonisachees2y ago

A lot. It's the same reason amazon doesn't sell servers and instead gives you access to a single instance that everyone pretends is the same but in reality is massively transient.

ethbr12y ago

> What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription

Knowledge/training.

If you're shipping a brand new hardware arch, exposed as raw hardware, then you're on the hook for training everyone in the world and fixing all their weird edge case uses.

I.e. are you willing to invest in Intel/AMD/Nvidia-scale QA and support?

If you're exposing a PaaS (or even IaaS), then you have some levers you can tweak / mask behind the scenes, so only your team need be experts at low-level operations.

For a fast-paced company, the latter model makes a lot more sense, at least until hardware+software stabilizes.

[0] https://www.evp.cloud/post/diving-deeper-insights-from-our-l...

latchkey2y ago

Good question. Not much. If anything, what I'm doing is even harder because I will have multiple sources for the hardware. I have to deal with all of the hardware and data center issues, as well as the customers who rely on us to provide them access.

Good thing that I'm a glutton for punishment.

winwang2y ago

How's the overall software support for MI300 series? The hardware itself looks great.

(also, +100 to valuing honesty and transparency)

latchkey2y ago

The hardware is actually pretty amazing. 192GB (or 1.5TB in a chassis), is a game changer.

I'll let you know once I get my hands on them again. There really isn't enough public information about them at all. So far, my friends at ElioVP [0] have published a blog post. Still with not enough detail for my taste, but I'm pretty sure he is limited by what he can talk about. Luckily, I am not.

I mention in another comment below that my current goal is to get a bunch of people to perform testing on them and then publish blog posts along with open source code. This way, we can start a repository of CI/CD tests to see how things improve with time. ROCm 6.1 is rumored to be quite an improvement.

https://www.reddit.com/r/LocalLLaMA/comments/1bpgrdf/wanted_...

rnts082y ago

That's interesting, something that I've been really wanting to get into as well, but where I am there is literally no venture capital to raise for this at the moment. I'd be interested to know more and/or bounce some ideas though.

latchkey2y ago

Extremely capital intensive, but also requires relationships in the industry at many levels. Luckily, I happen to have both and they are crazy enough to put their trust in me. I feel very grateful for that.

c-c-c-c-c2y ago

The GroqCard (RS-GQ-GC1-0109) was in stock at mouser a few weeks ago and they are still taking orders.

htrp2y ago

that or if you put chips in the hands of your customers, they may start to benchmark it against other equivalent solutions

latchkey2y ago

Funny you should mention that. ;-)

I've got about a dozen people signed up. Just working through some hardware issues right now (see above about high rate of failures), and hope to have this resolved next week, so that I can get people onto them and doing their testing.

https://wow.groq.com/isca-2022-paper

gandalfgeek2y ago· 11 in thread

They're calling the lie on needing bleeding edge hardware for performance.

5 yr old silicon (14 nm!!) and no hbm.

Their secret sauce seems to be an ahead-of-time compiler that statically lays out entire computation, enabling zero contention at runtime. Basically, they stamp out all non-determinism.

ipsum22y ago

It's not really a lie though. They require 20x more chips (storing all the weights in sram instead of hbm is expensive!) than Nvidia GPUs for a ~2x speed increase. Overall the power cost is more expensive for groq than GPUs.

ethbr12y ago

Are power and 20x 14-nm chip capacity limiting factors currently?

It's not inconceivable that's a better trade-off than leading-node and HBM requirements.

ipsum22y ago

Edit: 200x more chips, not 20x.

rattray2y ago

Source? Or how do you know that?

halflings2y ago

No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?

(the way I understood it => it's still cost effective at scale due to throughput increase this brings)

gandalfgeek2y ago

> No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?

No doubt fast SRAM helps, but from a computation pov imho its that they've statically planned computation and eliminated all locks.

Short explainer here: https://www.youtube.com/watch?v=H77tV1KcWIE (Based on their paper).

germanjoey2y ago

cost effective in what sense? groq doesn't achieve high efficiency, only low latency. but that's not done in a cost-effective way. compare sambanova achieving the same performance with 8 chips instead of 568, and with higher precision.

dralley2y ago

So Itanium and its "sufficiently smart compiler" but functional?

ethbr12y ago

From skimming the link above, it seems like they accepted it's extremely difficult (maybe impossible) to generate high ILP from a VLIW compiler on complex hardware (what Itanium tried to do).

So they attacked the italicized portion and simplified the hardware. Mostly by eliminating memory-layer non-determinism / using time-sync'd global memory instructions as part of the ISA(?).

This apparently reduced the difficulty of the compiler problem to something manageable (but no doubt still "fun")... and voila, performance.

torginus2y ago

I wonder if the use of eDRAM (https://en.wikipedia.org/wiki/EDRAM), which is essentially embedding DRAM into a chip made on a logic process would be a good idea here.

EDRAM is essentially a tradeoff between SRAM and DRAM, offering much greater density at the cost of somewhat worse throughput and latency.

There were a couple of POWER cpus that used EDRAM as L3 cache, but it seems to have fallen out of favor.

grandmczeb2y ago

It fell out of favor because it lost the density advantage in newer processes.

geor9e2y ago· 9 in thread

I don't understand why the comments are trash-talking Groq. They are the fastest LLM inference provider by a big margin. Why would they sell their hardware to any other company for any price? Keep it all for themselves and take over the market. 95% of my LLM requests go to Groq these days because it's 0.25 seconds round trip for a complete answer. In comparison, "Claude Instant" takes about 4 seconds. The other 5% of my requests go to Claude Opus and GPT-4, when I'm willing to wait an excruciating 5+ seconds for a better answer. I hate waiting. Latency is king. Groq wins.

fitzn2y ago

What open source model are you using when you hit groq?

I just benchmarked some perf for some of my larger context window queries last week and groq's API took 1.6 seconds versus 1.8 to 2.2 for OpenAI GPT-3.5-turbo. So, it wasn't much faster. I almost emailed their support to see if I was doing something wrong. Would love to hear any details about your workload or the complexity of your queries.

YetAnotherNick2y ago

It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.

In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.

bee_rider2y ago

What context did I miss that implies they are using an open source model?

laserbeam2y ago

> 1.6 vs 1.8-2.2 seconds

I believe certain companies would kill for 20% performance improvements on their main product.

EVa5I7bHFq9mnYK2y ago

>> why the comments are trash-talking Groq

they probably bought NVDA stock :)

freediver2y ago

How do you decide which requests to send to gpt4/opus?

huac2y ago

why don't you stream the results?

tpetry2y ago

You still have to wait for the end of the streamed response until you can continue with your task.

zachbee2y ago

If I was developing an AI app, I'd care about quality first before speed. And the open-source models just aren't as good as the closed ones.

mlazos2y ago· 5 in thread

The smoke and mirrors around groq are finally clearing. Truth is that their system is insanely expensive to maintain. hundreds (> 500 iirc) of chips to get wild tokens/s but the power and maintenance expense is crazy high for that number of chips. TCO just isn’t worth it

ein0p2y ago

You don't know that. For one thing, their silicon costs are going to be relatively cheap. It's an old reliable, 14nm process, and compared to even Google's TPU this is a relatively simple chip. For another they _could_ be putting all that silicon to a good use, and by all indications they are. Because there's far less local memory movement, and weights are distributed throughout the system, even this 14nm system could be energy efficient. 9/10ths of all power in a conventional system does not go towards compute - it's wasted in moving data back and forth. This is especially bad in transformers, which, because of their size, largely defeat the memory hierarchies the architects worked so hard to perfect. IOW, all your caches are useless and you're unnecessarily wasting 90% of your energy while also getting worse latency and worse throughput (due to memory bus bandwidth constraints). Oops. These folks seem to be offering something that nobody else does - a feasible, proven way to get out of jail free. I wish them all the success they can get, because all the other currently available architectures are largely unsuitable for high throughput transformer inference, and they work in spite, instead of because, of their design.

mlazos2y ago

Peak H100 power consumption is 700W. Average power consumption of the groq card (from their own website) is 240W. With 576 chips it just doesn’t look good. How much is that millisecond perf gain worth it to end users?

That said I think their arch is super interesting. I just think that demo was way too hype when the actual system is pretty impractical.

[1] https://www.semianalysis.com/p/the-dark-side-of-the-semicond... [2] https://www.extremetech.com/computing/272096-3nm-process-nod...

Oribi2y ago

Why would they want to run it themselves if the TCO didn’t work out

b-side2y ago

Because they rather operate at a loss with high revenue rather than have 0 revenue and loss?

mrkeen2y ago

I thought that was par for the course these days.

Operate at a loss. Get a big valuation. Cash out.

alted2y ago· 4 in thread

Custom state-of-the-art silicon is ridiculously expensive.

For a minimum 100 wafers = 10k chips, Groq may have paid $100M = $10k/chip purely in amortizing design costs.

Chip design (software + engineer time) and fabrication setup (lithography masks) grow exponentially [1][2] with smaller nodes, e.g., maybe $100M for Groq's current 14nm chips to ~$500M for their planned 4nm tapeout. Once you reach mass production (>>1000 wafers, which have ~150 large chips each), wafers are $10k each. On top of this, it takes ~1 year to design then have prototypes made. (These same issues still exist on older slower nodes, albeit not as bad.)

This could be reduced somewhat if chip design software were cheaper and margins were lower, but maybe 20% of this cost is due to fundamental manufacturing difficulty.

(disclosure: I don't work with recent tech nodes myself; this is my best guess)

latchkey2y ago

> Custom state-of-the-art silicon is ridiculously expensive.

Think about the amount of money being dumped into "AI" at this point. If you've got the technology and people to make stuff faster/better/cheaper, finding investors to dump money into your chip making business is probably not as hard as it was 2 years ago.

Groq is making this change for other reasons than the expense of tapping out chips.

shrubble2y ago

The report I read said that latest TSMC is 17K per wafer. How much less it is for 14nm I don't know.

karma_pharmer2y ago

The masks are the expensive part, not the wafers.

jkachmar2y ago

i don’t support hardware development directly, but i’m a software infrastructure engineer working adjacent to the teams that do so.

can’t comment on specifics, but imo our hardware team punches above its weight class in terms of # of people and time spent in design.

LoganDark2y ago· 4 in thread

That sucks. I wanted to save up for a couple years and get some hardware for home, but I guess the "AI" space moves so fast you barely get a couple months

SuchAnonMuchWow2y ago

You would need ~250 groq cards to run a 7B model since their system doesn't scale down. So if you want to buy their hardware, you need a few millions dollars.

Their hardware was never for people at home, but for cloud providers.

LoganDark2y ago

That doesn't sound right. Their public demo ran on 568 LPUs because they had Mixtral-8x7B and LLaMA-70B (45B and 70B respectively). IIRC their cards each have slightly over 200MB of SRAM so this almost exactly checks out.

A 7B model would then be able to run on about 60 LPUs. Even at $20,000 per card that would be only $1.2 million and I highly doubt the cost is actually that high, that's just what DigiKey says the cost of an LPU is, if you're trying to buy just one :)

wmf2y ago

Save up for Tenstorrent instead.

LoganDark2y ago

I'll look into it, though seeing "contact us" always makes me think they're not going to sell a single unit to a home user. (With that said, Groq probably wouldn't either. You can technically buy LPUs for 20k each, without an expectation of support, but it takes tens of them to run Mixtral.)

Tenstorrent also looks incredibly Python-specific (as in, everything including their SMI seems mostly Python-based) which doesn't seem promising?

Havoc2y ago· 3 in thread

Given that their hardware is different I can kinda see how they don’t want to deal with supporting customers.

> what do you mean I can’t just drop a CUDA docker image in?

htrp2y ago

if you're a hardware startup that doesn't sell hardware, what are you?

> if you're a hardware startup that doesn't sell hardware, what are you?

A hardware startup that sells cloud access to its hardware. :-)

Havoc2y ago

Hardware setup that produces superior hardware and extracts the benefit in house ?

creato2y ago· 3 in thread

> If customers come with requests for high volumes of chips for very large installations, Groq will instead propose partnering on data center deployment. Ross said that Groq has “signed a deal” with Saudi state-owned oil company Aramco, though he declined to give further details, saying only that the deal involved “a very large deployment of [Groq] LPUs.”

What? How does this make sense?

theturtletalks2y ago

If you read on, Groq said they would only sell hardware to US companies and outside companies would get cloud services, not the LPUs. I think the US government told them to keep the LPUs in-house since they could be the secret sauce for scale.

creato2y ago

I'm not questioning the deployment strategy, I'm wondering why Saudi Aramco wants to access so much compute power that is highly specialized(?) for generative AI workloads. Or is it more general than that?

shrubble2y ago

Oil & gas has large data needs, they had petabyte-scale data 2 decades ago.

BoorishBears2y ago· 3 in thread

Read: We're forcing someone's hand in acquiring us.

Groq is still under a 30 request per minute rate-limit, which drops to 10 requests per minute if you have all day usage.

Billing has been "coming soon" this whole time, and while they've built out hype enabling features like function calling, somehow they can't setup a Stripe webhook to collect money for realistic rate limits.

They couldn't scream "we can't service the tiniest bit of our demand" any louder at this point.

Edit: For anyone looking for fast inference without the smoke and mirrors, I've been using Fireworks.ai in production and it's great. 200 tk/s - 300 tk/s is closer to Groq than it is to OpenAI and co.

And as a bonus they support PEFT with serverless pricing.

arthurcolle2y ago

they don't even let us pay them, it's insane

I just have free API access with no ability to add a credit card.

brcmthrowaway2y ago

What are you using all this for? Whats the product?

BoorishBears2y ago

I run an AI story telling site and an AI ideation platform.

The story telling site alone averaged 27k requests a day this week, so about double what their current request limit is, and honestly not even that popular of a site.

You can't run much more than a toy project on their current rate limits.

vinay_ys2y ago· 2 in thread

This business model is bound to get attacked and suffer a painful exit soon. Here's why:

First, the whole systems of chips architecture that everyone is talking about will solve for increasing overall SRAM available to keep more model state on super fast memory and avoid going to slow memory.

Secondly, anyone serious about their data (enterprises) won't be okay with making API calls to Groq. Anyone serious about their data and have a lot scale (consumer internet) won't also be okay with making expensive API calls to Groq at scale.

Their cloud is attractive only if I can use their API for experimentation toy apps to continue developing in this direction while the rest of the major industry players systems of chip architecture catches up and solves for SRAM size bottleneck and manufacturing process bottleneck, and once that's solved, I get more powerful compute for cheaper $$ to deploy on-prem.

So, this cloud strategy is short-lived. I see another pivot on the horizon.

frozenportOP2y ago

>> won't be okay with making API calls to Groq

Linked article:

  If customers come with requests for high volumes of chips for very large installations, Groq will instead propose partnering on data center deployment

sebastiennight2y ago

The same has been said of OpenAI for a couple of years now (that they're just a platform to prototype on before moving on to open source models)...

... and yet, they're still leading the field.

I think it's a bit early to think the field is getting commoditized yet.

dsrtslnd232y ago· 2 in thread

So unless there are new Croq datacenters coming, this is only interesting for North American users. Otherwise H100 based latency optimized solutions would be faster - in particular for time-to-first-token sensitive applications.

LoganDark2y ago

> latency optimized solutions would be faster - in particular for time-to-first-token sensitive applications

Do you have any idea how fast Groq is? Go try it. Consistently over 400 t/s for most of the models that they support, and extremely low latency.

huac2y ago

time to first token != tokens per second

remember that EU -> US is ~150ms unavoidable latency, for example. then your comparison is local H100 vs Grok + 150ms latency to first token.

https://www.tomshardware.com/news/no-sram-scaling-implies-on...

ilaksh2y ago· 1 in thread

I'm not able to get consistent replies from the API. It's lightening fast for like ten minutes and then starts freezing up for several seconds.

I want to use it, but it's been very unreliable. I have been using Claude 3 and thinking about together.ai with Mixtral.

QuadrupleA2y ago

Same, it's great when it's quick / available, but they seem underprovisioned for busy times and I often get long 10-30 second stalls.

zetazzed2y ago· 1 in thread

Man, I want to appreciate a nice new hardware approach, but they say such BS that it is hard to read about them:

> “There might need to be a new term, because by the end of next year we’re going to deploy enough LPUs that compute-wise, it’s going to be the equivalent of all the hyperscalers combined,” he said. “We already have a non-trivial portion of that.”

Really? Does anyone seriously believe they are going to be the equivalent of all hyperscalers in compute next year? (Where Meta alone is at 1 million H100 equivalents.) In the same article where they say it's too hard for them to sell chips? And when they literally don't have a setup to even accept a credit card today?

wmf2y ago

You don't put a million-dollar rack on a credit card. I'm not sure they want retail customers for their API either.

scosman2y ago· 1 in thread

Wildly fast inference. And current chips are 14nm so headroom to get a lot better.

jsheard2y ago

Note that SRAM density doesn't scale at the same rate as logic density, and Groqs "secret sauce" is putting a ton of SRAM on their chips. Their stuff won't necessarily see the full benefits of switching to denser nodes if the bottleneck is how much SRAM they can pack onto each chip.

IIRC the last big jump for SRAM density was at 7nm, so they do still have that card to play, but progress has slowed to a crawl beyond that. TSMC 3nm SRAM is barely denser than TSMC 7nm SRAM.

rnts082y ago

Sounds like they're looking to get bought up to me. I'm sure they could monetize their current hardware, and build to sell just like other niche hardware vendors. Anyone remember the hype around big "cloud" storage boxes 10 years back?

zachbee2y ago

Totally saw this one coming! [1]

I think one major challenge they'll face is that their architecture is incredibly fast at running the ~10-100B parameter open-source models, but starts hitting scaling issues with state-of-the-art models. They need 10k+ chips for a GPT-4-class model, but their optical interconnect only supports a few hundred chips.

[1] https://www.zach.be/p/why-is-everybody-talking-about-groq

pha3922y ago

IMHO, Groq is being shadow acquired by Google

karma_pharmer2y ago

Another casualty of AI KYC.

j / k navigate · click thread line to collapse

148 comments

83 comments · 18 top-level

latchkey2y ago· 16 in thread

Interesting, I guess that is why I never got a response back from them about buying their stuff.

> no-one wants to take the risk of buying a whole bunch of hardware

I do!

Nobody has stated it yet, but this is probably great news for tenstorrent.

Disclosure: building a cloud compute provider starting with AMD MI300x, and eventually any other high end hardware that our customers are asking for.

gpapilion2y ago

It’s basically their minimum cluster size for a reasonable model requires 8ish racks of compute.

The cloud provider path sounds riskier since that’s two capital intensive businesses, chip design and production and running a cloud service provider.

latchkey2y ago

ukd12y ago

How do y'all compare to https://tensorwave.com?

latchkey2y ago

¯\_(ツ)_/¯

> If you have to sell the hardware and support people buying it, that is a world of trouble.

What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription?

alephnerd2y ago

> What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription

Margins.

Pricing for cloud compute is much higher and servicing and management for the provider is much cheaper.

themoonisachees2y ago

A lot. It's the same reason amazon doesn't sell servers and instead gives you access to a single instance that everyone pretends is the same but in reality is massively transient.

ethbr12y ago

> What is the difference between this and having to sell the cloud access and supporting the people who buy a subscription

Knowledge/training.

If you're shipping a brand new hardware arch, exposed as raw hardware, then you're on the hook for training everyone in the world and fixing all their weird edge case uses.

I.e. are you willing to invest in Intel/AMD/Nvidia-scale QA and support?

If you're exposing a PaaS (or even IaaS), then you have some levers you can tweak / mask behind the scenes, so only your team need be experts at low-level operations.

For a fast-paced company, the latter model makes a lot more sense, at least until hardware+software stabilizes.

[0] https://www.evp.cloud/post/diving-deeper-insights-from-our-l...

latchkey2y ago

Good thing that I'm a glutton for punishment.

winwang2y ago

How's the overall software support for MI300 series? The hardware itself looks great.

(also, +100 to valuing honesty and transparency)

latchkey2y ago

The hardware is actually pretty amazing. 192GB (or 1.5TB in a chassis), is a game changer.

https://www.reddit.com/r/LocalLLaMA/comments/1bpgrdf/wanted_...

rnts082y ago

latchkey2y ago

c-c-c-c-c2y ago

The GroqCard (RS-GQ-GC1-0109) was in stock at mouser a few weeks ago and they are still taking orders.

htrp2y ago

that or if you put chips in the hands of your customers, they may start to benchmark it against other equivalent solutions

latchkey2y ago

Funny you should mention that. ;-)

https://wow.groq.com/isca-2022-paper

gandalfgeek2y ago· 11 in thread

They're calling the lie on needing bleeding edge hardware for performance.

5 yr old silicon (14 nm!!) and no hbm.

Their secret sauce seems to be an ahead-of-time compiler that statically lays out entire computation, enabling zero contention at runtime. Basically, they stamp out all non-determinism.

ipsum22y ago

ethbr12y ago

Are power and 20x 14-nm chip capacity limiting factors currently?

It's not inconceivable that's a better trade-off than leading-node and HBM requirements.

ipsum22y ago

Edit: 200x more chips, not 20x.

rattray2y ago

Source? Or how do you know that?

halflings2y ago

No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?

(the way I understood it => it's still cost effective at scale due to throughput increase this brings)

gandalfgeek2y ago

> No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?

No doubt fast SRAM helps, but from a computation pov imho its that they've statically planned computation and eliminated all locks.

Short explainer here: https://www.youtube.com/watch?v=H77tV1KcWIE (Based on their paper).

germanjoey2y ago

dralley2y ago

So Itanium and its "sufficiently smart compiler" but functional?

ethbr12y ago

From skimming the link above, it seems like they accepted it's extremely difficult (maybe impossible) to generate high ILP from a VLIW compiler on complex hardware (what Itanium tried to do).

So they attacked the italicized portion and simplified the hardware. Mostly by eliminating memory-layer non-determinism / using time-sync'd global memory instructions as part of the ISA(?).

This apparently reduced the difficulty of the compiler problem to something manageable (but no doubt still "fun")... and voila, performance.

torginus2y ago

I wonder if the use of eDRAM (https://en.wikipedia.org/wiki/EDRAM), which is essentially embedding DRAM into a chip made on a logic process would be a good idea here.

EDRAM is essentially a tradeoff between SRAM and DRAM, offering much greater density at the cost of somewhat worse throughput and latency.

There were a couple of POWER cpus that used EDRAM as L3 cache, but it seems to have fallen out of favor.

grandmczeb2y ago

It fell out of favor because it lost the density advantage in newer processes.

geor9e2y ago· 9 in thread

fitzn2y ago

What open source model are you using when you hit groq?

YetAnotherNick2y ago

It's not a lot more faster for input but it is something like 10x faster for output(mixtral vs gpt-3.5). This could enable completely new mode of interaction with LLMs e.g. agents.

In most of the cases, overall response time is mostly dominated by output as it is ~100x slower per token than input.

bee_rider2y ago

What context did I miss that implies they are using an open source model?

laserbeam2y ago

> 1.6 vs 1.8-2.2 seconds

I believe certain companies would kill for 20% performance improvements on their main product.

EVa5I7bHFq9mnYK2y ago

>> why the comments are trash-talking Groq

they probably bought NVDA stock :)

freediver2y ago

How do you decide which requests to send to gpt4/opus?

huac2y ago

why don't you stream the results?

tpetry2y ago

You still have to wait for the end of the streamed response until you can continue with your task.

zachbee2y ago

If I was developing an AI app, I'd care about quality first before speed. And the open-source models just aren't as good as the closed ones.

mlazos2y ago· 5 in thread

ein0p2y ago

mlazos2y ago

That said I think their arch is super interesting. I just think that demo was way too hype when the actual system is pretty impractical.

[1] https://www.semianalysis.com/p/the-dark-side-of-the-semicond... [2] https://www.extremetech.com/computing/272096-3nm-process-nod...

Oribi2y ago

Why would they want to run it themselves if the TCO didn’t work out

b-side2y ago

Because they rather operate at a loss with high revenue rather than have 0 revenue and loss?

mrkeen2y ago

I thought that was par for the course these days.

Operate at a loss. Get a big valuation. Cash out.

alted2y ago· 4 in thread

Custom state-of-the-art silicon is ridiculously expensive.

For a minimum 100 wafers = 10k chips, Groq may have paid $100M = $10k/chip purely in amortizing design costs.

This could be reduced somewhat if chip design software were cheaper and margins were lower, but maybe 20% of this cost is due to fundamental manufacturing difficulty.

(disclosure: I don't work with recent tech nodes myself; this is my best guess)

latchkey2y ago

> Custom state-of-the-art silicon is ridiculously expensive.

Groq is making this change for other reasons than the expense of tapping out chips.

shrubble2y ago

The report I read said that latest TSMC is 17K per wafer. How much less it is for 14nm I don't know.

karma_pharmer2y ago

The masks are the expensive part, not the wafers.

jkachmar2y ago

i don’t support hardware development directly, but i’m a software infrastructure engineer working adjacent to the teams that do so.

can’t comment on specifics, but imo our hardware team punches above its weight class in terms of # of people and time spent in design.

LoganDark2y ago· 4 in thread

That sucks. I wanted to save up for a couple years and get some hardware for home, but I guess the "AI" space moves so fast you barely get a couple months

SuchAnonMuchWow2y ago

You would need ~250 groq cards to run a 7B model since their system doesn't scale down. So if you want to buy their hardware, you need a few millions dollars.

Their hardware was never for people at home, but for cloud providers.

LoganDark2y ago

wmf2y ago

Save up for Tenstorrent instead.

LoganDark2y ago

Tenstorrent also looks incredibly Python-specific (as in, everything including their SMI seems mostly Python-based) which doesn't seem promising?

Havoc2y ago· 3 in thread

Given that their hardware is different I can kinda see how they don’t want to deal with supporting customers.

> what do you mean I can’t just drop a CUDA docker image in?

htrp2y ago

if you're a hardware startup that doesn't sell hardware, what are you?

> if you're a hardware startup that doesn't sell hardware, what are you?

A hardware startup that sells cloud access to its hardware. :-)

Havoc2y ago

Hardware setup that produces superior hardware and extracts the benefit in house ?

creato2y ago· 3 in thread

What? How does this make sense?

theturtletalks2y ago

creato2y ago

shrubble2y ago

Oil & gas has large data needs, they had petabyte-scale data 2 decades ago.

BoorishBears2y ago· 3 in thread

Read: We're forcing someone's hand in acquiring us.

Groq is still under a 30 request per minute rate-limit, which drops to 10 requests per minute if you have all day usage.

They couldn't scream "we can't service the tiniest bit of our demand" any louder at this point.

And as a bonus they support PEFT with serverless pricing.

arthurcolle2y ago

they don't even let us pay them, it's insane

I just have free API access with no ability to add a credit card.

brcmthrowaway2y ago

What are you using all this for? Whats the product?

BoorishBears2y ago

I run an AI story telling site and an AI ideation platform.

The story telling site alone averaged 27k requests a day this week, so about double what their current request limit is, and honestly not even that popular of a site.

You can't run much more than a toy project on their current rate limits.

vinay_ys2y ago· 2 in thread

This business model is bound to get attacked and suffer a painful exit soon. Here's why:

So, this cloud strategy is short-lived. I see another pivot on the horizon.

frozenportOP2y ago

>> won't be okay with making API calls to Groq

Linked article:

  If customers come with requests for high volumes of chips for very large installations, Groq will instead propose partnering on data center deployment

sebastiennight2y ago

The same has been said of OpenAI for a couple of years now (that they're just a platform to prototype on before moving on to open source models)...

... and yet, they're still leading the field.

I think it's a bit early to think the field is getting commoditized yet.

dsrtslnd232y ago· 2 in thread

LoganDark2y ago

> latency optimized solutions would be faster - in particular for time-to-first-token sensitive applications

Do you have any idea how fast Groq is? Go try it. Consistently over 400 t/s for most of the models that they support, and extremely low latency.

huac2y ago

time to first token != tokens per second

remember that EU -> US is ~150ms unavoidable latency, for example. then your comparison is local H100 vs Grok + 150ms latency to first token.