Artificial analysis does good API provider inference benchmarking and has evaluated Cerebras, Groq, Sambanova, the many Nvidia-based solutions, etc. IMO it makes way more sense to benchmark actual usable end points rather than submit closed and modified implementations to mlcommons. Graphcore had the fastest BERT submission at one point (when BERT was relevant lol) and it didn't really move the needle at all.