> Two reasons: Disaster recovery and freedom from vendor lock in.
In my experience, those two reasons are almost never sufficient to warrant a multi-cloud solution. The costs for multi-cloud are enormous. Another commenter mentioned egress costs, but there are numerous other costs:
1. You've added a lot of complexity on top of existing cloud solutions. That complexity can make things fail in unique ways that may make some of your cherished reliability benefits moot.
2. You are always coding to the "lowest common denominator" of any cloud service, meaning you're missing out on a ton of productivity by forgoing useful services.
I'm just curious if anyone can comment that has experience actually using multi-cloud, and was it worth it?
Here is what we found, previously when people talked about going to the cloud the state of the art was everything is done targeting a specific cloud provider, you put your software in immutable AMI's and you use ASG, and ELB, along with S3 and EBS to have really robust systems. You instrument everything with CloudWatch and make sure everything is locked down with IAM and security groups.
What we have seen lately is that because of Kubernetes that has all changed. Most systems being designed today are being done very much provider agnostic, and the only time you want to be locked into a specific technology is when the vendor provided solution doesn't really have an alternative in a truly vendor agnostic stack. Part of what this service is doing is taking the last true bit of Gravity for a cloud provider and removing it, you can now run in both clouds just as easily as if you were all in on one of them. There are some additional costs if you are transferring all your data across the wire, but that is where the power of Vitess's sharding comes in. You can run your service across two clouds, while minimizing the amount of cross talk, until you want to migrate off.
Also while this post makes a big deal about being multi-cloud, this also gives you true multi-region databases. Thats something that was really only available with Spanner or CosmosDB previously, both of which require you to target them explicitly. PlanetScaleDB lets you use your existing MySQL compatible software.
I worked at a company that wrote an internal infrastructure/deploy management tool based on Kubernetes. You could deploy an app either to our colo facility (we were an old business transitioning to the cloud), or you could deploy it to AWS. As a developer I never interacted with the AWS console, this internal tool just hid it all from me. However, while I had the option of deploying to our colo or to the cloud, it was one or the other; the service was only running in one platform in prod.
And after a multi year push to the cloud, the company actually had to stop that huge push because costs were spiralling out of control. Managing all the costs across a huge enterprise of many services (some micro, some not) became a huge challenge. Can't even imagine the additional cost or complexity if some of those services spanned multiple cloud providers.
I thought you all were offering Vitess not a "custom" solution, or are you speaking marketing?
Proprietary software can be good in contexts where the integration surface is small. The denominator you want to code against in such a way that you become deeply intertwined with it & would have a hard time backing out is open source.
The "lowest common denominator" being Linux, Postgres, Kubernetes, etc. is not such a terrible thing.
So Global high availability (Hybrid multi cloud) and no vendor lock are pretty important.
Many enterprises got really excited about multi-cloud really fast but then gave up once they faced the harsh reality of increased complexity (and, as a consequence, cost and time).
People tend to grossly overestimate the actual size of the ‘common denominator’. On the face of it services look almost identical (every cloud has compute, blob storage, block storage, etc.), in reality, there are so many subtle differences between analogous services (API, pricing model, failure modes, performance, security models) that the support and operation cost easily more than doubles. So even in large enterprises where there is a top-down edict to ‘be multi-cloud’ actual BU/team silos tend to stick to one or the other provider.
The other often overlooked aspect is that clouds offer bulk discounts (I’ve seen up to 40%) for customers that spend many tens or hundreds of millions of dollars, but one can only get this if they stick to one provider. In other words, the economics of multi-cloud doesn’t scale well.
Kubernetes strives to be the common layer, but cross-cloud deployments tend to be very convoluted and non-trivial. They require a lot of manual work, tons of expertise in different domains (networking, security), greatly increase management overhead, and introduce funny new failure modes. In addition, Kubernetes is just one piece of the puzzle. After one is done with Kubernetes, they still need to figure out their user-facing services for launching containers, provisioning and managing databases, message queues, analytics, and machine learning pipelines. Now compare that entire ordeal with clicking a few buttons (or writing several pages of Terraform) and having everything setup and ready to go in an instant and on-demand. Yes, you do have to tie yourself to a single cloud, give up some degrees of freedom, and use extremely high-margin proprietary services (like Kinesis instead of Kafka) but it’s so so much easier, faster and cheaper (at least in the short to medium term).
To answer the original question (and sorry for going off on a tangent), I haven’t seen any company actually succeed with multi-cloud (my sample is 100+ SMBs and large enterprises). Even extremely tech-savvy and sophisticated companies like Twitter (with their shift to GCP) tend to think of all this as something that doesn’t necessarily need to be built anymore and should be bought instead.
What I do see all the time, is companies trying to figure out a way to marry their existing on-prem and (single) cloud provider setups. It is still a struggle though, and will probably always be to a certain extent.
Edit: typos
We're starting a journey on Azure and AWS at once with limited financial resources, and limited talent(it's tough to hire in cloud skills to work for us, and our stack is so old it's not an easy transition for people who only know that). Operating AWS and Azure and require different skill sets and different approaches and they're far from transferable. All the tools and techniques we develop or acquire for managing AWS are not applicable to Azure and vice versa, and because we're splitting our effort between the two everything takes twice as long.
I think the right way for a company like us to approach this would be to go "all in" in one, build expertise and offer a lot of value back to the business, then look to build out the second cloud to meet your BCP/Cost Savings goals.
But yeah, when I look at the rate that some companies sunset their products, I understand the fear a bit.
How does planetscale handle the complexity of DIY classic VPN and ensuring a high availability on those VPN links - and ensuring that a certain amount of throughput can be sustained?
Is there a requirement for planetscale to create a full network mesh between all cloud providers, all regions? I'm assuming that it's more selective because it becomes untenable as more cloud regions pop out requiring (n * (n-1))/2 VPN links where n is the number of cloud regions.
Happy to learn anything I can here. Thanks for the blog post.
For the HA of VPN links for most providers its handled automatically, AWS <-> Azure and AWS <-> GCP both are HA links offered by the provider. Azure <-> GCP is a Classic VPN so we need two of them and we need to manage the routes to make sure they would fail over in the event of a loss of one system.
Throughput is another story, we are very much limited by the throughput of the various VPN's. We haven't pushed GCP or Azure to the max to see what they can do, but according to the documentation we should be expecting around 300Mbps across each link in the mesh before we start to see throttling. At that point it makes sense for us to move to a co-located exchange and peer with dedicated connections.
Finally for other providers, when we start to get them inbound we will be looking at using the transit gateway's of the various providers to reduce the total number off links needed, or standing up virtual routers to act as exchanges.
Hopefully we will be doing another post with more technical details and some benchmarks!
Multi-cloud is a network problem. Ask anybody who knows what they're doing: is it the best idea to have dependencies over WAN? No. Can it be a solution to a problem? Yes, but what's your problem? PlanetScale might have a case if their product sells.
Then* only come the platform problems.