My benchmark for large language models (opens in new tab)

(nicholas.carlini.com)

4 pointscheviethai1232y ago2 comments

2 comments

2 comments · 1 top-level

cheviethai123OP2y ago· 1 in thread

Consider how low the score of Gemini here compared to the other LLM test. And I'm impressed by the evaluation method's ability to assess performance without relying on tailored prompts.

hoamatcuoi2y ago

But the benchmark only scoring Gemini-Pro 1, I'm curious how the Gemini Ultra performance here but guessed we couldn't know yet.

j / k navigate · click thread line to collapse