undefined | Better HN

0 pointsUehreka11mo ago0 comments

I once asked Claude Code (Opus 4) to review a codebase I’d built, and threw in at the end of my prompt something like “No need to be nice about it.”

Now granted, you could say it was “flattering that instruction”, but it sure didn’t flatter me. It absolutely eviscerated my code, calling out numerous security issues (which were real), all manner of code smells and bad architectural decisions, and ended by saying that the codebase appeared to have been thrown together in a rush with no mind toward future maintenance (which was… half true… maybe more true than I’d like to admit).

All this to say that it is far from obvious that LLMs are intrinsically bad critics.

0 comments

4 comments · 2 top-level

colonCapitalDee11mo ago· 2 in thread

The problem isn't that LLMs can't be critical, it's that LLMs don't have taste. It's easy to get an LLM to give praise, and it's easy to get an LLM to give criticism, but getting an LLM to praise good things and criticize bad things is currently impossible for non-trival inputs. That's not say that prompting your LLM to generate criticism is useless, it's just that any LLM prompted to generate criticism is going to criticize things are that actually fine, just like how an LLM prompted to generate praise (which is effectively the default behavior) is going to praise things that are deeply not fine.

bubblyworld11mo ago

Absolutely matches my experience - it can still be super helpful, but AI have an extreme version of an anchoring bias.

jauhar_11mo ago

Another issue is that the behaviour of the LLMs is not very consistent.

Herring11mo ago

I have an idea. What if we used a third LLM to evaluate how good the secondary LLM is at critiquing the primary LLM.

j / k navigate · click thread line to collapse