Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
LLM Evals Are Just Tests. Why Are We Making This So Complicated? | Better HN
LLM Evals Are Just Tests. Why Are We Making This So Complicated?
(opens in new tab)
(cameronwestland.com)
3 points
camwest
7mo ago
2 comments
Share
2 comments
default
newest
oldest
8organicbits
7mo ago
So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing.
camwest
OP
7mo ago
Never? No. Way less likely? Yes!
In dev we do 100 consistency checks and get green. In CI we do 10.
j
/
k
navigate · click thread line to collapse