Using M1/2/3 Max for LLM inference isn't at all "speculative", it's a thing today and high end Apple Silicon being an option for LLM inference is becoming general knowledge among the local inference community. The original author of llama.cpp (one of the leading LLM inference projects) developed it on a Mac and it has full Metal acceleration support.
The $20/month subscription is going to give you access to commercial models, but generally you have to run the open weight models yourself. With the unified RAM you can trivially run the larger 70B+ models.
AI researchers generally have to use CUDA due to how the ecosystem is still mostly CUDA-only for training and fine tuning, but those who need to occasionally use custom/local models for inference will likely find high end Macs being a good fit for their use cases.