Of course I could purchase a larger VM at a greater fixed cost per month but since this API will only be called a few hundred times per month there must be an option for per minute or per call pricing while the API mostly sits idly waiting for a request.
I am aware of serverless but the time to load the ML models for each call seems like it would take way too long to get a response, unless I have a misunderstanding about serverless then do please inform me.
And if it matters for any of the answers I'm using FastAPI and Celery for the web side and task queue then I have Yolov3 to detect objects of interest from an image then pass the object image to another model for OCR and make a prediction of the text it finds. I'm new to ML, so I've got a lot to learn and appreciate all the feedback.