undefined | Better HN

0 pointshn_throwaway_997y ago0 comments

For serverless technologies to "win" they have to solve the "cold start" problem. AWS likes to pitch Lambdas as an easy mobile backed, but if you need to talk to a DB (which most mobile backends do) then you'll want to put your Lambda in a VPC, which makes cold starts on the order of 5-10 seconds, which is a deal breaker for most synchronous APIs.

I don't understand why AWS or GCP haven't added "pre-warming" requests to their cloud functions, similar to App Engine.

0 comments

12 comments · 6 top-level

stevehawk7y ago· 5 in thread

why the need to put lambdas in a VPC in order to hit a DB?

merlincorey7y ago

A VPC in AWS is essentially a virtual datacenter.

For many years now, essentially all AWS services are tied to a VPC.

Each account gets 5 VPCs per region, by default.

Whether you use RDS or EC2 to setup a database server, it will be tied to a VPC for networking isolation purposes.

As such you then would need the Lambda in the VPC, or to allow public internet access to the database.

The point is pretty moot though, because you can schedule Cloudwatch Events every 4 minutes to keep a lambda warm, if necessary.

Frameworks like Zappa even do this for you automatically.

hn_throwaway_99OP7y ago

> The point is pretty moot though, because you can schedule Cloudwatch Events every 4 minutes to keep a lambda warm, if necessary.

I encourage you to read this article, https://theburningmonk.com/2018/01/im-afraid-youre-thinking-... , because if you're running a web API with Lambdas, keeping one instance warm with the "cloudwatch event every 4 minutes" trick will most definitely not solve your cold start issues.

1 more reply

skluck7y ago

It is somewhat arbitrary though, isn't it? If AWS adds the ability to use security groups without a VPC, a lot of these issues go away wrt vpc or siloed infrastructure limits (cold starts).

There are still reasons to be in a private network - Being "one typo away" from exposing your services/db to the world is scary. But that seems like a solveable problem as well...

jflans7y ago

I think to do otherwise you'd have to have the DB open to the public internet.

buzzdenver7y ago

Because the DB doesn't have a public IP, which is very good.

013a7y ago· 1 in thread

It kind of makes sense, but mostly from a marketing and planning standpoint. If you need pre-warming, that probably means you know enough about lambda (and AWS) to know that you need pre-warming. Setting it up via a Cloudwatch event is easy as pie.

Think about the implications if they added a button to the lambda console to "pre-warm". There are two options: (1) set up the cloudwatch event for you (which is a similar pattern we've seen AWS use for things like DynamoDB table autoscaling), or (2) have some other internal system which can keep them warm.

Its easy to say "just do (1), it'd be so easy", but the issue is that it introduces a very weird cost pattern to lambda. Lambda isn't just billed per invocation, its billed essentially with time live. So if they auto-configure a cloudwatch event, lets say it sends an empty `{}` argument, they have no idea how long your function is designed to run given that input. Moreover, they don't even know that your function won't error with that input. So they've got this new feature and even they can't predict what it will do to your bill or system stability, given the fact that we're dealing with arbitrary code blobs.

The only option is (2). Now think about allocating engineering effort to this problem: as a manager, would you rather allocate a team to work on an extra complex scheduling parameter, or continue to improve the fundamental warm-up time for any function? Maybe both. But now you've got this extra parameter there which increases customer expectations and makes future scheduling work much more difficult.

jacques_chester7y ago

The keep-alive behaviour of Lambda is essentially a subsidy. I suspect that AWS have paid a lot of attention to ways to cut it down, or are at least hoping the they can drive down the idling cost without anyone noticing too much of a performance hit.

jacques_chester7y ago

> For serverless technologies to "win" they have to solve the "cold start" problem.

I've said here and elsewhere before that autoscaling is easy to say and hard to do.

We keep looking to autoscalers to divine our economic preferences, which they cannot do for us. What's been missing is the ability to explicitly trade off latency for expense.

The best you can do is to a) attack startup time any how, any way possible, b) react sanely to unexpected traffic changes, c) make reasonable forecasts and d) explicitly tune cost of idleness vs cost of delay vs probability of delay. These help, but the problem will never fully go away.

(Unless you've discovered an escape hatch from either of causality or integral calculus. If you have, please share it with the class.)

whopa7y ago

I think this is why AWS promotes DynamoDB so much, because it's one of the few datastores that work sanely with non-VPC lambdas.

tango127y ago

Or you can use serverless asynchronously only.

Have a low latency container based API with min replicas and auto-scale, almost like an atomic CRUD API. Move as much to async serverless which is triggered on events.

k__7y ago

Hasn't AppSync already solved that problem?

j / k navigate · click thread line to collapse

0 comments

12 comments · 6 top-level

stevehawk7y ago· 5 in thread

why the need to put lambdas in a VPC in order to hit a DB?

merlincorey7y ago

A VPC in AWS is essentially a virtual datacenter.

For many years now, essentially all AWS services are tied to a VPC.

Each account gets 5 VPCs per region, by default.

Whether you use RDS or EC2 to setup a database server, it will be tied to a VPC for networking isolation purposes.

As such you then would need the Lambda in the VPC, or to allow public internet access to the database.

The point is pretty moot though, because you can schedule Cloudwatch Events every 4 minutes to keep a lambda warm, if necessary.

Frameworks like Zappa even do this for you automatically.

hn_throwaway_99OP7y ago

> The point is pretty moot though, because you can schedule Cloudwatch Events every 4 minutes to keep a lambda warm, if necessary.

1 more reply

skluck7y ago

It is somewhat arbitrary though, isn't it? If AWS adds the ability to use security groups without a VPC, a lot of these issues go away wrt vpc or siloed infrastructure limits (cold starts).

There are still reasons to be in a private network - Being "one typo away" from exposing your services/db to the world is scary. But that seems like a solveable problem as well...

jflans7y ago

I think to do otherwise you'd have to have the DB open to the public internet.

buzzdenver7y ago

Because the DB doesn't have a public IP, which is very good.

013a7y ago· 1 in thread

jacques_chester7y ago

> For serverless technologies to "win" they have to solve the "cold start" problem.

I've said here and elsewhere before that autoscaling is easy to say and hard to do.

We keep looking to autoscalers to divine our economic preferences, which they cannot do for us. What's been missing is the ability to explicitly trade off latency for expense.

(Unless you've discovered an escape hatch from either of causality or integral calculus. If you have, please share it with the class.)

whopa7y ago

I think this is why AWS promotes DynamoDB so much, because it's one of the few datastores that work sanely with non-VPC lambdas.

tango127y ago

Or you can use serverless asynchronously only.

Have a low latency container based API with min replicas and auto-scale, almost like an atomic CRUD API. Move as much to async serverless which is triggered on events.

k__7y ago

Hasn't AppSync already solved that problem?

j / k navigate · click thread line to collapse