undefined | Better HN

0 pointssosodev5mo ago0 comments

How can you be sure that your benchmark is meaningful and well designed?

Is the only thing that prevents a benchmark from being meaningful publicity?

0 comments

I didn't tell you what you should think about the model. All I said is that you should have your own benchmark.

I think my benchmark is well designed. It's well designed because it's a generalization of a problem I've consistently had with LLMs on my code. Insofar that it encapsulates my coding preferences and communication style, that's the proper benchmark for me.

1 more reply

j / k navigate · click thread line to collapse

0 comments

prodigycorp5mo ago

I didn't tell you what you should think about the model. All I said is that you should have your own benchmark.

1 more reply

j / k navigate · click thread line to collapse