Zero downtime is usually tricky and will greatly depend on your architecture. I can I think of two overall paths to success, but there are more (and these may or may not work for your use case).
1) Prod A / Prod B flip. Your current code runs in prod, we will call Prod A. Bring up an entire copy of your stack with the new code version as prod B.. Once it is up and stable, you switch traffic over to it.. once all traffic is switched, you kill the old servers. Tricky part with this is state, do you need to worry about state loss? If you keep a constant database that both prod A and prod B hit, you can do this without too much trouble. Make sure you do no state on your individual app servers (session cache etc).
2) Slow roll. Say you have 10 web servers that are behind a load balancer. You take 1 down - upgrade it - then add it back. Repeat until all 10 are upgraded. The trick here is what happens if a user hits code version A then B then back to A? If it doesn't matter, easy. If it matters, you may need to lock clients to machines from the load balancer, so that no one that has seen the new B will ever switch to an old server still on A.
1) Push out a new code
2) Shut down all prod servers at once
3) Restart them
Maybe during a weekly maintenance window.. maybe at 2am.. maybe at noon.. depending on the company and clients. Basic assumption is "Meh, people will reload if the page is down for a few minutes"
Even though 0 downtime is the "right" way to do stuff, seems like the ops level of many places is not that high.
The fact that you have scripts and are making some kind of attempt for less downtime puts you in the top 25% of the internet.
The CNM is realized using the newly introduced Experimental Networking solutions which includes : Network & Service UI, Pluggable Service-Discovery and Native Multi-Host cross-container connectivity. More information on trying out these experimental feature : https://github.com/docker/docker/blob/master/experimental/ne... https://github.com/docker/docker/blob/master/experimental/RE...
Service (aka endpoint) owns the networking configs (such as ip-address, mac-address, etc...) and the container that backs the service can be swapped while retaining the same networking and service configs. Hence swapping a container between older to newer version of app server is just a matter of detaching a service from the older container and attaching the same service back to the newer container. Also, Please note that a container can belong to multiple networks and each container can publish different services in different network.
With these simple and composable CNM design, your use-case can be mapped to the CNM model. A quick diagram explaining the concept : https://docs.google.com/drawings/d/1LvD94UwfinQelpEqT9BaRYmi...
We can add more detailed documentation for this specific use-case. Please join us in https://github.com/docker/libnetwork and IRC@freenode #docker-network channel to discuss this in more detail.
Load balancers, service discovery, IP swapping, etc. all help accomplish this.
In a multi-container app based on docker with cross-container communication, as structured below:
load balancer → app servers * n → db * 2
How does one deploy new code to the app servers without any downtime?
I'm looking for any docker-based solutions, toolkits or simply tips and tricks, thanks!