I just did:
./mixtral-8x7b-instruct-v0.1.Q8_0.llamafile --cli -t 16 -n 200 -p "In terms of Lasso"
I got 15 tokens per second for prompt evaluation and 8 tokens per second for regular eval.
The same hardware can run things much faster on OSX, or if you use more quantization but I prefer to run things at Q8 or f16 even if they are slow. In the future I how to use GPU, ANE and the crazy 1.58 or 0.68 bit quantization but for now this does the trick handsomely.