I don't understand why AWS or GCP haven't added "pre-warming" requests to their cloud functions, similar to App Engine.
For many years now, essentially all AWS services are tied to a VPC.
Each account gets 5 VPCs per region, by default.
Whether you use RDS or EC2 to setup a database server, it will be tied to a VPC for networking isolation purposes.
As such you then would need the Lambda in the VPC, or to allow public internet access to the database.
The point is pretty moot though, because you can schedule Cloudwatch Events every 4 minutes to keep a lambda warm, if necessary.
Frameworks like Zappa even do this for you automatically.
I encourage you to read this article, https://theburningmonk.com/2018/01/im-afraid-youre-thinking-... , because if you're running a web API with Lambdas, keeping one instance warm with the "cloudwatch event every 4 minutes" trick will most definitely not solve your cold start issues.
There are still reasons to be in a private network - Being "one typo away" from exposing your services/db to the world is scary. But that seems like a solveable problem as well...
Think about the implications if they added a button to the lambda console to "pre-warm". There are two options: (1) set up the cloudwatch event for you (which is a similar pattern we've seen AWS use for things like DynamoDB table autoscaling), or (2) have some other internal system which can keep them warm.
Its easy to say "just do (1), it'd be so easy", but the issue is that it introduces a very weird cost pattern to lambda. Lambda isn't just billed per invocation, its billed essentially with time live. So if they auto-configure a cloudwatch event, lets say it sends an empty `{}` argument, they have no idea how long your function is designed to run given that input. Moreover, they don't even know that your function won't error with that input. So they've got this new feature and even they can't predict what it will do to your bill or system stability, given the fact that we're dealing with arbitrary code blobs.
The only option is (2). Now think about allocating engineering effort to this problem: as a manager, would you rather allocate a team to work on an extra complex scheduling parameter, or continue to improve the fundamental warm-up time for any function? Maybe both. But now you've got this extra parameter there which increases customer expectations and makes future scheduling work much more difficult.
I've said here and elsewhere before that autoscaling is easy to say and hard to do.
We keep looking to autoscalers to divine our economic preferences, which they cannot do for us. What's been missing is the ability to explicitly trade off latency for expense.
The best you can do is to a) attack startup time any how, any way possible, b) react sanely to unexpected traffic changes, c) make reasonable forecasts and d) explicitly tune cost of idleness vs cost of delay vs probability of delay. These help, but the problem will never fully go away.
(Unless you've discovered an escape hatch from either of causality or integral calculus. If you have, please share it with the class.)
Have a low latency container based API with min replicas and auto-scale, almost like an atomic CRUD API. Move as much to async serverless which is triggered on events.