OpenGameEval: Eval Framework to Benchmark Agentic AI Assistants (opens in new tab)

(corp.roblox.com)

7 pointsmoneil9716mo ago1 comments

1 comments

1 comments · 1 top-level

OpenGameEval offers a unique testing ground to evaluate core model capabilities related to agenetic reasoning and long-horizon task solving.

j / k navigate · click thread line to collapse