Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
Stevvo
6mo ago
0 comments
Save
Share
The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.
0 comments
2 comments · 2 top-level
top
newest
oldest
golly_ned
6mo ago
Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.
2 more replies
getnormality
6mo ago
Well, the variance is itself interesting.
j
/
k
navigate · click thread line to collapse