https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...
But the bigger models are more useful, so that’s what people fixate on.