nomel gave a good answer in a different thread
> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.
To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!
https://news.ycombinator.com/item?id=38742466