undefined | Better HN

0 pointsLauraMedia9mo ago0 comments

Am I missing something? If LLM-1 is supposed to judge LLM-2, doesn't LLM-1 have to be better than LLM-2? If LLM-1 is only 40% as good at coding as LLM-2, why would you trust the LLM with the lesser knowledge?

0 comments

BlindEyeHalo9mo ago

At the heart of the P vs NP problem lies the observation that solution verification seems to be much easier than solution generation. If that applies in this context is another question but I think it is not unreasonable to assume that the judge needs to be less powerful than the performer.

Or in other words, I don't need to be a chef myself to decide if a meal is good or not.

rowanG0779mo ago

That really doesn't hold for all problems. You can imagine any number of problems where a valid solution is easier, complexity wise, to generate than it is to validate. A trivial example is semiprime factorization. Easy to generate any semiprime, hard to factor.

BlindEyeHalo9mo ago

Sure, it was never my intention to make it seem like a general statement, just highlighting that there is a large class of problems for which it is true.

As you point out there are many problems that higher complexity classes than NP.

mcphage9mo ago

> That really doesn't hold for all problems.

But it does hold for this problem.

1 more reply

jama2119mo ago

Pretty sure they know that, their point still stands

1 more reply

torginus9mo ago

It's a bit different for reasoning LLMs - they operate in a feedback loop, measuring the quality of the solution and iterating on it until either the quality meets a desired threshold, or all reasoning effort is expended.

This can correct for generation errors, but cannot correct for quality measurement errors, so the question is valid.

cubefox9mo ago

It's usually easier to create a false statement than to check whether it's false.

stingraycharles9mo ago

At least use something like Zen MCP’s Consensus tool to gain a consensus around a large variety of models.

j / k navigate · click thread line to collapse

0 comments

BlindEyeHalo9mo ago

Or in other words, I don't need to be a chef myself to decide if a meal is good or not.

rowanG0779mo ago

BlindEyeHalo9mo ago

Sure, it was never my intention to make it seem like a general statement, just highlighting that there is a large class of problems for which it is true.

As you point out there are many problems that higher complexity classes than NP.

mcphage9mo ago

> That really doesn't hold for all problems.

But it does hold for this problem.

1 more reply

jama2119mo ago

Pretty sure they know that, their point still stands

1 more reply

torginus9mo ago

This can correct for generation errors, but cannot correct for quality measurement errors, so the question is valid.

cubefox9mo ago

It's usually easier to create a false statement than to check whether it's false.

stingraycharles9mo ago

At least use something like Zen MCP’s Consensus tool to gain a consensus around a large variety of models.

j / k navigate · click thread line to collapse