Cheapest way to deploy smaller fine-tuned AI models?

2 pointsjohhns42y ago2 comments

Any tips on how to deploy and use a fine-tuned model on Huggingface in a cost effective way? Right now looking into use Gradio with HuggingFace spaces and using the API endpoint from there. Inference endpoints and Sagemaker seem excessive for this. The whole idea to use smaller models is to decrease costs (vs using a bigger model with an API endpoint) but maybe this just isn't cost effective for where we are right now.

2 comments

ilaksh2y ago

If you're only using it incrementally then Replicate and Modal Labs have per-second pricing.

Not sure about HuggingFace though.

Sagemaker supposedly has a Serverless endpoint but haven't looked into it and doubt it would be a good deal since it's AWS.

johhns4OP2y ago

Looks like replicate is perfect. Will look into it. Thanks!

j / k navigate · click thread line to collapse