This is great product design at its finest.
First of all, they never “handle more requests than they have hardware.” That’s impossible (at least as I’m reading it).
The vast majority of usage is via their web app (and free accounts, at that). The web app defaults to “auto” selecting a model. The algorithm for that selection is hidden information.
As load peaks, they can divert requests to different levels of hardware and less resource hungry models.
Only a very small minority of requests actually specify the model to use.
There are a hundred similar product design hacks they can use to mitigate load. But this seems like the easiest one to implement.