The basic concept is out there.
Lots of smart people studying hard to catch up to also be poached. No shortage of those I assume.
Good trainingsdata still seems the most important to me.
(and lots of hardware)
Or does the specific training still involves lots of smart decisions all the time?
And those small or big decisions make all the difference?
We’d probably see more companies training their own models if it was cheaper, for sure. Maybe some of them would do very well. But even having a lot of money to throw at this doesn’t guarantee success, e.g. Meta’s Llama 4 was a big disappointment.
That said, it’s not impossible to catch up to close to state-of-the-art, as Deepseek showed.
The basic concept is out there: run very fast.
Lots of people running every day who could be poached. No shortage of those I assume.
Good running shoes still seem the most important to me.
2. Cost to train is also prohibitive. Grok data centre has 200,000 H100 Graphics cards. Impossible for a startup to compete with this.
its funny to me since xAI literally the "youngest" in this space and recently made an Grok4 that surpass all frontier model
it literally not impossible
I assume startup here means the average one, that has a little bit less of funding and connections.
xAI was just spun out to raise more money / fix the x finance issues.
It’s the difference between running a marathon (impressive) and winning a marathon (here’s a giant sponsorship check).
But the truth is to have experience building models at this scale requires working at a high level job at a major FAANG/LLM provider. Building what Meta needs is not something you can do in your basement.
The reality is the set of people who really understand this stuff and have experience working on it at scale is very, very small. And the people in this space are already paid very well.
It's very very rare to have winner takes all to such an extreme degree as code llm models
Claude does have a slight edge in quality (which is why it's my default) but infrastructure/cost/speed are all relevant too. Different providers may focus on one at the expense of the others.
One interesting scenario where we could end up is using large hosted models for planning/logic, and handing off to local models for execution.
Coding startups also try to fine-tune OSS models to their own ends. But this is also very difficult, and usually just done as a cost optimization, not as a way to get better functionality.