Here is a queueing api server for self hosted inference backends:
https://github.com/aime-team/aime-api-server from a friend of mine. Very light weight and easy to use. You can even serve models from Jupyter Notebooks with it without needing to worry about overwhelming the server. It just gets slower the more load you send to it.