I am not a DevOps guru, but it seems like an interesting project. AWS costs almost twice more than analogous DO servers, but AWS has way more features.
DO community thread: https://www.digitalocean.com/community/questions/autoscaling-solutions-for-digital-ocean-are-there-existing-solutions
Of course it will be open source.
In the end, we only used it for our sandbox environment, as the production env runs on bare metal (more capacity, cheaper at scale, easier on admin).
So I'd say, from our experience:
- DO's API was quite easy to work with
- consul.io was used as a reliable distributed source of information, for leader elections and health monitoring... Changing the autoscaler configuration in consul.io produced immediate results like starting/stopping new instances... Cool "remote control" effect ;-)
- haproxy/nginx load balancers use consul.io templates to update their configuration
- our autoscaler was HA, through a leader election. The instances managed themselves (no single point of failure). There were at least 2 instances running.
- you should expect a few "surprises" if you're running consul.io on Digital Ocean, heartbeats are delayed quite often (depends on datacenters), which makes failure detection hard
- and of course, we used DO custom images to start new instances
isn't that true for any CP store in any "cloud" provider?
As I said, we've deployed our production environment on bare metal (in reaction to this exact problem...)
This is counter to most horizontal scaling strategies but it's really about the same. When you add more servers you're essentially just adding more CPUs and RAM via VMs. Being able to do it on the same machines minus any configuration time or provisioning time is really slick (especially for DBs).
Setting up a load balancer in front of a few instances that could take advantage of rolling vertical scaling would be a spin on autoscaling that played to one of DO's real strengths.
You create a new droplet concurrently the one running and then flip the IP to point to the new droplet. No downtime.
If I were you, I'd make heavy use of ansible, or something similar, for provisioning:
1) folks are familiar with it 2) could make it cross-platform more easily 3) well, no reinventing the wheel.
For example, Ansible has ec2 module http://docs.ansible.com/ansible/ec2_module.html where you describe an instance and the number of them that should be running. So if you have 3 instances running and wish to have 5, it does the magic and spins up new ones. Then, you can add them to a load balancer. Maybe there's something similar for DO already?
The way I see it is that it would poll if scaling conditions are met and execute ansible playbooks if they are, and then some web interface to set the conditions / view the scaling logs / current status.
It can turn out to be a very entertaining and educational side project :)
If you decide to do it, drop me an email - something I would be happy to brainstorm and discuss about :)
EDIT: It could also be used not only for autoscaling but also for self-healing. If some instance crashed and is not responding anymore, then spin up a new one.
I wouldn't disqualify the tool for a problem in one of the modules either. Bugs happen :)
A lot of the time when people want auto-scaling, the thing that strikes me is that most of them wouldn't have needed auto-scaling if they picked a cheaper provider to beging with. Often they could pick a dedicated provider, spin up 3 times as much capacity and still pay less.
1. Even if DigitalOcean is the best cloud provider today -- I'm not saying it is or isn't -- the probability it is the best that will ever be is approximately zero. The landscape changes. Heterogeneity among cloud providers is increasing and anyway latency will always be a function of the physical location of data centers where the data is sharded.
2. It's the right alignment for an open source project because it is driven by the broad interests of developers rather than the narrow needs of a single company. DigitalOcean may change its pricing policy. It may cease to exist. It may make breaking API changes. All for legitimate business reasons orthogonal to those of particular developers. If their autoscaling code is platform independent, then that's not a crisis.
Good luck.
Edit:
Commented more.
You can either use VM snapshots or our metadata service (cloud-init compatible) to provision new instances automatically. We also have an API, but you don't need it for autoscaling.
We're a European cloud provider (based in the Netherlands) and don't have PoPs in the US though.
When you us AWS you are paying for more than just a cheap VPS, which essentially is all DO is. It's comparing fast food to a fancy steak house. You can get meat at both, but at one side its microwaved. Not to say DO isn't great, it gets the job done and provides a valued service, but your money gets you what your money gets you.
It's really hard to find scenarios where AWS isn't ridiculously overpriced.
Consider that Digital Ocean is also an expensive alternative, but I deploy caching proxies for some clients who for various reasons insist on using AWS on DO because you can save lots by deploying droplets on DO to cache rather than pay AWS bandwidth costs for all your traffic, for example (you serve more than 1-2TB a month out of AWS you can start saving money that way).
It would allow you to add nodes with the packages that you need extremely easy. In one or two hours a cron could be made that adds and removes nodes as needed.
It is not a magic bullet solution but it fixes 80% of the problem with 20% of the effort.
Basically, my docker cluster starts from 3 nodes and as the cluster gets filled with containers I automatically add new nodes on Digital Ocean to increase the cluster capacity.
Our team worked a day to make it happen but we got excellent support from the team.
There are more options, but I think I could go that one.
You can scale compute power... the main issue is... how do you scale bandwitch at digital ocean?
I believe you will need just a very thin layer on top of it to add droplets and you can call it done.
I don't have a clue how much harder this would be, but a price difference of about 300% looks much more interesting.
Seriously, building a AWS clone on DO infrastructure is just not possible. Even if it would be it would actually be more expensive in the end as AWS for small to medium projects. If you using AWS you do not only pay for one virtual server, you are paying for a infrastructure. You would have to replicate parts of this infrastructure, which would require a few separate servers and you couldn't share those costs with other users.
Using DO instead of dedicated servers for this would be like building a Uber competitor using the Uber API.
Edited: I overlooked the fact that at least neither Hetzner nor OVH do provide a useful (for this project) way to order a dedicated server via the API. They do offer an Order API, but the spin up time is too long.
That said, one of the reasons to use providers like Hetzner and OVH is that as I've pointed out elsewhere, the price difference is so large that most people who "need" auto-scaling at places like AWS would pay less if they just ordered 2x to 3x as much capacity at a place like Hetzner and left it on 24/7.
Very few people have loads that are genuinely spiky enough to save enough from auto-scaling to make up for the massive cost difference.