> Keep in mind that modern quantitative approaches to LLM evaluation have been effectively co-designed with the rise of OpenAI, and folks like Ravenwolf routinely disagree with the leaderboards.
Sorry but you're talking complete nonsense here. The benchmark by LMSys (chatbot arena) cannot be gamed, and Ravenwolf is a random-ass poster with no scientific rigor to his benchmarks.