AWS AI Stack – Ready-to-Deploy Serverless AI App on AWS and Bedrock (opens in new tab)

(github.com)

43 pointsfitzgera1d1y ago19 comments

19 comments

ac3601y ago

Increasingly bullish on AWS Bedrock.

• Devs forever want choice.

• Open-source LLMs are getting better

• Anthropic ships fantastic models

• Doesn't expose your app’s data to multiple companies

• Consolidated security, billing, config in AWS

• Power of AWS ecosystem

blackeyeblitzar1y ago

I am worried about AWS imposing their own political rules on the models. For example, they may impose censorship, or safety requirements. It is hard for me to trust them as a central platform in this ecosystem

ac3601y ago

+1 Bedrock supports custom model import, though I haven't used it and can't speak to limitations there. Also, this boilerplate provides a solid foundation for any LLM app, whether you use Bedrock or opt for models hosted elsewhere.

agcat1y ago

You can check out this technical deep dive on Serverless GPUs offerings/Pay-as-you-go way. This includes benchmarks around cold-starts, performance consistency, scalability, and cost-effectiveness for models like Llama2 7Bn & Stable Diffusion across different providers -https://www.inferless.com/learn/the-state-of-serverless-gpus... .Can save months of your time. Do give it a read.

P.S: I am from Inferless

rmbyrro1y ago

Last time I checked Bedrock was quite expensive to operate in a small scale.

dheerkt1y ago

I'm confused, what's expensive about it? It's a serverless pay per token model?

Do you mean specifically the Bedrock Knowledgebase/RAG -- that uses serverless OpenSearch which costs at minimum $200ish/month bc it doesn't scale to zero?

scosman1y ago

Serverless Bedrock helps. Close enough to per token pricing of others if you need to be on AWS.

ethagnawl1y ago

I have not read too deeply into this but, do any of these serverless environments offer GPUs? I'm sure there are ... reasons but the lack of GPU support in Lambda and Fargate remains a major paint point for AWS users.

It's been keeping me wrangling EC2 instances for ML teams but I do wonder how much longer that will last.

Mernit1y ago

The major clouds don't support serverless GPU because the architecture is fundamentally different from running CPU workloads. For Lambda specifically, there's no way of running multiple customer workloads on a single GPU with Firecracker.

A more general issue is that the workloads that tend to run on GPU are much bigger than a standard Lambda-sized workload (think a 20Gi image with a smorgasbord of ML libraries). I've spent time working around this problem and wrote a bit about it here: https://www.beam.cloud/blog/serverless-platform-guide

akdev1l1y ago

> there's no way of running multiple customer workloads on a single GPU with Firecracker.

You can do this with SR-IOV enabled hardware.

https://docs.nvidia.com/networking/display/mlnxofedv581011/s...

ZeroCool2u1y ago

The only big one I know of is Cloud Run on GCP.

https://cloud.google.com/run/docs/configuring/services/gpu

ethagnawl1y ago

This sounds very compelling. Thanks!

ac3601y ago

I know for sure this has been on AWS's road map for multiple years now. RE:invent is near. Let's see if they can ship..

isoprophlex1y ago

The big guys are lagging a bit, but there are many smaller parties offering serverless GPU.

I've been a quite satisfied customer of Runpod's serverless GPU offering, running a side project that uses computer vision to detect toxic clouds in webcam feeds of an industrial site.

If you want generative AI, try Replicate, as they have offer a more specialized product.

scosman1y ago

They use GPUs under the hood for inference/fine-tuning and charge by token. Fireworks will even let you deploy a Lora serverless at the same pricing as base model.

But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool.

fitzgera1dOP1y ago

Introducing the AWS AI Stack

A serverless boilerplate for AI apps on trusted AWS infra.

• Full-Stack w/ Chat UI + Streaming

• Multiple LLM Models + Data Privacy

• 100% Serverless

• API + Event Architecture

• Auth, Multi-Env, GitHub Actions & more!

Github: https://github.com/serverless/aws-ai-stack

Demo: https://awsaistack.com

brap1y ago

I don’t get it. How many people need to deploy their own custom AI chat apps over standard models?

eahefnawy1y ago

This is meant to be a boilerplate or an example of how to build a Serverless AI app on AWS. You can clone the repo and customize it however you like.

justanotheratom1y ago

Then need to go one step further and do what Replit did - AI Engineer generates code that gets deployed to this AWS AI Stack.

j / k navigate · click thread line to collapse

19 comments

ac3601y ago

Increasingly bullish on AWS Bedrock.

• Devs forever want choice.

• Open-source LLMs are getting better

• Anthropic ships fantastic models

• Doesn't expose your app’s data to multiple companies

• Consolidated security, billing, config in AWS

• Power of AWS ecosystem

blackeyeblitzar1y ago

ac3601y ago

agcat1y ago

P.S: I am from Inferless

rmbyrro1y ago

Last time I checked Bedrock was quite expensive to operate in a small scale.

dheerkt1y ago

I'm confused, what's expensive about it? It's a serverless pay per token model?

Do you mean specifically the Bedrock Knowledgebase/RAG -- that uses serverless OpenSearch which costs at minimum $200ish/month bc it doesn't scale to zero?

scosman1y ago

Serverless Bedrock helps. Close enough to per token pricing of others if you need to be on AWS.

ethagnawl1y ago

It's been keeping me wrangling EC2 instances for ML teams but I do wonder how much longer that will last.

Mernit1y ago

akdev1l1y ago

> there's no way of running multiple customer workloads on a single GPU with Firecracker.

You can do this with SR-IOV enabled hardware.

https://docs.nvidia.com/networking/display/mlnxofedv581011/s...

ZeroCool2u1y ago

The only big one I know of is Cloud Run on GCP.

https://cloud.google.com/run/docs/configuring/services/gpu

ethagnawl1y ago

This sounds very compelling. Thanks!

ac3601y ago

I know for sure this has been on AWS's road map for multiple years now. RE:invent is near. Let's see if they can ship..

isoprophlex1y ago

The big guys are lagging a bit, but there are many smaller parties offering serverless GPU.

I've been a quite satisfied customer of Runpod's serverless GPU offering, running a side project that uses computer vision to detect toxic clouds in webcam feeds of an industrial site.

If you want generative AI, try Replicate, as they have offer a more specialized product.

scosman1y ago

They use GPUs under the hood for inference/fine-tuning and charge by token. Fireworks will even let you deploy a Lora serverless at the same pricing as base model.

But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool.

fitzgera1dOP1y ago

Introducing the AWS AI Stack

A serverless boilerplate for AI apps on trusted AWS infra.

• Full-Stack w/ Chat UI + Streaming

• Multiple LLM Models + Data Privacy

• 100% Serverless

• API + Event Architecture

• Auth, Multi-Env, GitHub Actions & more!

Github: https://github.com/serverless/aws-ai-stack

Demo: https://awsaistack.com

brap1y ago

I don’t get it. How many people need to deploy their own custom AI chat apps over standard models?

eahefnawy1y ago

This is meant to be a boilerplate or an example of how to build a Serverless AI app on AWS. You can clone the repo and customize it however you like.

justanotheratom1y ago

Then need to go one step further and do what Replit did - AI Engineer generates code that gets deployed to this AWS AI Stack.

j / k navigate · click thread line to collapse