undefined | Better HN

0 pointsjmalicki6d ago0 comments

I find these internet arguments talking about LLMs as if they are trained by reading the internet to be wild.

Yes, pretraining still exists. But for the past few years, pretraining by reading the internet is just the initial bootstrapping of LLM training. The RL training they get from bespoke training data, with very very different characteristics than what these armchair analyses claim, dominates these days.

0 comments

23 comments · 5 top-level

MattRogish6d ago· 10 in thread

I'd have to imagine there are wildly diminishing marginal returns to additional SFT/post-training passes.

There are a bounded number of (useful) derivations/combinations of Duff's device.

If Frontier Labs wish to reduce hallucinations on factual things, they will have to hire people (or the data providers will need to) to do fundamental research above and beyond what is available in extant literature and the web. IE if the LLMs want to lower precision error, they need to go out and actually find more expertise. If the wikipedia page for Pompey lacks data, where are they going to get it from? How would they even _identify_ that the page has holes?

Yes, they can digitize more books but that is untrustworthy data - if there were enough eyeballs on a particular work, it would be in the internet. If it's not, they'd need to hire the experts themselves. They need expert reviewers in virtually every interesting topic, which fundamentally is an intractable problem, especially since things change all the time. Maybe even uninteresting topics, too?

I dunno, it doesn't seem to me "more data" is the magic bullet here. Yeah, it will "help" but we're already on the flat part of the S shaped curve.

My take from trying to understand this stuff is some sort of algorithmic improvement is necessary to get another step change in how well LLMs perform in this area. I could be wrong!

jmalickiOP6d ago

As a side gig, I write novel software that solves problems no existing software does, that existing LLMs have difficulty reproducing, purely for the purpose of existing as LLM training data.

There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

It's insane.

Yes, they are hiring the experts themselves. To create new knowledge above and beyond what's on the internet. To be locked away as LLM training data.

The largest characteristic of all of this new data is it is targeted at LLM's weak points.

It's not just more data, it's custom tutorials built for what LLMs struggle at.

palmotea6d ago

> There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

I kinda doubt that quality-assertion of "Atlantic-worthy." While I have no doubt such articles are written solely as training data, I'd expect their quality to be much less than the real thing, since there's no public to critique them, probably little reputational risk for errors, and no professional ethics to uphold. Even if professional journalists were hired to do the work, I'd expect them to start phoning it in pretty quick, and skimp on fact-checking especially.

2 more replies

MattRogish6d ago

I'm not saying they are not trying - I'm saying we're inventing new problems faster than any Lab can:

1) Identify the gaps

2) Determine how to fix them

3) Implement a fix (especially if that fix is: identify and find experts)

4) And judge the result

How do they know [person] is an expert in [some field]? How do they find that person? How many experts are necessary to give the right information? How do we evaluate the results, especially if it's novel?

You can find a lot of people who disagree on many topics, and those turtles go all the way down.

I'm not in disagreement that your work will help reduce hallucinations and improve model performance! It is.

I predict (I hope I'm wrong!) that we're going to hit some asymptote that is not at 0% hallucinations (and I would even put a substantial nonzero probability that "overall" hallucination rate bottoms out at some minimum and then slowly grows because we just can't keep up with the new garbage we throw at it).

2 more replies

ayewo6d ago

1. How did you land the side gig? Mercor or a lessor known brand?

2. What criteria do such vendors typically require?

1 more reply

victorbjorklund6d ago

What kind of programs? Can you give an example of the tasks?

1 more reply

calla5d ago

Hi i would like to hear More about this.Where is the citation?Or can you send more information ? Who is paying

calla5d ago

Hi so mercor is paying them? What's the prompt? Need more info!

coldtea5d ago

>There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

So now part of the training is directly slop - just of the pre-AI variety?

giardini6d ago

jmalicki says many things, among them being

"As a side gig, I write novel software that solves problems no existing software does,"

and

"Yes, they are hiring the experts themselves. To create new knowledge above and beyond what's on the internet. To be locked away as LLM training data."

More likely you're joking and/or paranoid!8-))

2 more replies

YeGoblynQueenne6d ago

>> They need expert reviewers in virtually every interesting topic, which fundamentally is an intractable problem, especially since things change all the time.

How odd. It's Expert Systems and the Knowledge Acquisition Bottleneck all over again.

mcphage6d ago· 6 in thread

Where do they get the bespoke training data from? And how much? I don’t really know anything about this.

jmalickiOP6d ago

> And how much?

Mercor, one of the larger vendors for contracting with experts to create bespoke data, says on their webpage they're paying $3M/day to their contractors for data.

So well into the billions of dollars a year for bespoke training data.

That's also ignoring the RLVR data labs can get from software - they can use the vibe coding sessions as training data as well without paying more.

They are just one of many.

blovescoffee6d ago

Companies like Mercor sell data from human experts

trothamel6d ago

Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.

1 more reply

dominotw6d ago

meta has reallocated a significant protion of their staff to genrating this

sroussey6d ago

Meta also reportedly took a 49% nonvoting stake in Scale AI in June 2025 for about $14.3–$14.8 billion.

coldtea5d ago

The same company that let their flagship product go to waste, then bet hundreds of billions into the stupid vr metaverse that went nowhere?

I'ts not like they really inspire much confidence in their future predictions.

jgalt2126d ago· 2 in thread

Outside of games and coding generating enough valid examples and counter-examples to harness the power of RL is cost prohibitive.

jmalickiOP6d ago

Which is why rubrics as rewards are used.

jgalt2126d ago

still cost prohibitive.

1 more reply

jazzdev6d ago

In the last few weeks Claude (Sonnet) has told me “I don’t know” 3 different times. That seems like the solution to hallucinations and it’s already happening.

dominotw6d ago

let me take down armchair analysis with my armchair analysis

j / k navigate · click thread line to collapse

0 comments

23 comments · 5 top-level

MattRogish6d ago· 10 in thread

I'd have to imagine there are wildly diminishing marginal returns to additional SFT/post-training passes.

There are a bounded number of (useful) derivations/combinations of Duff's device.

I dunno, it doesn't seem to me "more data" is the magic bullet here. Yeah, it will "help" but we're already on the flat part of the S shaped curve.

My take from trying to understand this stuff is some sort of algorithmic improvement is necessary to get another step change in how well LLMs perform in this area. I could be wrong!

jmalickiOP6d ago

As a side gig, I write novel software that solves problems no existing software does, that existing LLMs have difficulty reproducing, purely for the purpose of existing as LLM training data.

There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

It's insane.

Yes, they are hiring the experts themselves. To create new knowledge above and beyond what's on the internet. To be locked away as LLM training data.

The largest characteristic of all of this new data is it is targeted at LLM's weak points.

It's not just more data, it's custom tutorials built for what LLMs struggle at.

palmotea6d ago

> There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

2 more replies

MattRogish6d ago

I'm not saying they are not trying - I'm saying we're inventing new problems faster than any Lab can:

1) Identify the gaps

2) Determine how to fix them

3) Implement a fix (especially if that fix is: identify and find experts)

4) And judge the result

You can find a lot of people who disagree on many topics, and those turtles go all the way down.

I'm not in disagreement that your work will help reduce hallucinations and improve model performance! It is.

2 more replies

ayewo6d ago

1. How did you land the side gig? Mercor or a lessor known brand?

2. What criteria do such vendors typically require?

1 more reply

victorbjorklund6d ago

What kind of programs? Can you give an example of the tasks?

1 more reply

calla5d ago

Hi i would like to hear More about this.Where is the citation?Or can you send more information ? Who is paying

calla5d ago

Hi so mercor is paying them? What's the prompt? Need more info!

coldtea5d ago

>There are journalists being hired to write Atlantic-worthy articles that exist only as LLM training data, because they're getting paid more than the Atlantic would pay them for it.

So now part of the training is directly slop - just of the pre-AI variety?

giardini6d ago

jmalicki says many things, among them being

"As a side gig, I write novel software that solves problems no existing software does,"

and

"Yes, they are hiring the experts themselves. To create new knowledge above and beyond what's on the internet. To be locked away as LLM training data."

More likely you're joking and/or paranoid!8-))

2 more replies

YeGoblynQueenne6d ago

>> They need expert reviewers in virtually every interesting topic, which fundamentally is an intractable problem, especially since things change all the time.

How odd. It's Expert Systems and the Knowledge Acquisition Bottleneck all over again.

mcphage6d ago· 6 in thread

Where do they get the bespoke training data from? And how much? I don’t really know anything about this.

jmalickiOP6d ago

> And how much?

Mercor, one of the larger vendors for contracting with experts to create bespoke data, says on their webpage they're paying $3M/day to their contractors for data.

So well into the billions of dollars a year for bespoke training data.

That's also ignoring the RLVR data labs can get from software - they can use the vibe coding sessions as training data as well without paying more.

They are just one of many.

blovescoffee6d ago

Companies like Mercor sell data from human experts

trothamel6d ago

Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.

1 more reply

dominotw6d ago

meta has reallocated a significant protion of their staff to genrating this

sroussey6d ago

Meta also reportedly took a 49% nonvoting stake in Scale AI in June 2025 for about $14.3–$14.8 billion.

coldtea5d ago

The same company that let their flagship product go to waste, then bet hundreds of billions into the stupid vr metaverse that went nowhere?

I'ts not like they really inspire much confidence in their future predictions.

jgalt2126d ago· 2 in thread

Outside of games and coding generating enough valid examples and counter-examples to harness the power of RL is cost prohibitive.

jmalickiOP6d ago

Which is why rubrics as rewards are used.

jgalt2126d ago

still cost prohibitive.

1 more reply

jazzdev6d ago

In the last few weeks Claude (Sonnet) has told me “I don’t know” 3 different times. That seems like the solution to hallucinations and it’s already happening.

dominotw6d ago

let me take down armchair analysis with my armchair analysis

j / k navigate · click thread line to collapse