LLM Evals Are Just Tests. Why Are We Making This So Complicated? (opens in new tab)

(cameronwestland.com)

3 pointscamwest10mo ago2 comments

2 comments

2 comments · 1 top-level

8organicbits10mo ago· 1 in thread

So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing.

camwestOP10mo ago

Never? No. Way less likely? Yes!

In dev we do 100 consistency checks and get green. In CI we do 10.

j / k navigate · click thread line to collapse