Llama.cpp can run quantized Falcon on a top end Mac Studio, which is only five grand:
https://twitter.com/ggerganov/status/1699791226780975439If I'm paying a third party a hundred bucks a month, I'd at least want them to be able to match the capacities of consumer hardware.