• Devs forever want choice.
• Open-source LLMs are getting better
• Anthropic ships fantastic models
• Doesn't expose your app’s data to multiple companies
• Consolidated security, billing, config in AWS
• Power of AWS ecosystem
P.S: I am from Inferless
Do you mean specifically the Bedrock Knowledgebase/RAG -- that uses serverless OpenSearch which costs at minimum $200ish/month bc it doesn't scale to zero?
It's been keeping me wrangling EC2 instances for ML teams but I do wonder how much longer that will last.
A more general issue is that the workloads that tend to run on GPU are much bigger than a standard Lambda-sized workload (think a 20Gi image with a smorgasbord of ML libraries). I've spent time working around this problem and wrote a bit about it here: https://www.beam.cloud/blog/serverless-platform-guide
You can do this with SR-IOV enabled hardware.
https://docs.nvidia.com/networking/display/mlnxofedv581011/s...
I've been a quite satisfied customer of Runpod's serverless GPU offering, running a side project that uses computer vision to detect toxic clouds in webcam feeds of an industrial site.
If you want generative AI, try Replicate, as they have offer a more specialized product.
But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool.
A serverless boilerplate for AI apps on trusted AWS infra.
• Full-Stack w/ Chat UI + Streaming
• Multiple LLM Models + Data Privacy
• 100% Serverless
• API + Event Architecture
• Auth, Multi-Env, GitHub Actions & more!
Github: https://github.com/serverless/aws-ai-stack
Demo: https://awsaistack.com