For example, I have a Next.js project currently deployed via Vercel and it's about 40-60 seconds before becoming available. I have heard stories of several minutes to deploy and would love to hear what kind of spread there is.
1-2 days. Multiple senior engineers cherry picking commits into a release branch with even more seniors doing atteststion. It’s a company-wide effort that happens every sprint. We have “staff” SREs who can’t figure out automated releases.
Sometimes there could also be attestation for things like "does the design and implementation of new service's architecture comply with security policy - does it have the approval of the security representative". some of these things can be automated away with tooling -- e.g. install a tool to check for glaring container security vulnerabilities that runs at code review time, and block changes from being merged or being deployed into dev or staging environments if there are high or critical sev security issues, with some manual process to override the inevitable false positives when the tooling makes a silly decision.
All that said, it isn't necessarily practical or cost effective to attempt to automate everything. Skilled human expert review of proposed designs and implementations for security flaws will likely do a much, much more effective job than relying on tools alone. So it's quite reasonable for designs and implementation to require independent security review and approval (or "attestation") in contexts where security is critical. But if things like security review of implementation (let alone design) is being done while cherry-picking commits onto a release branch, that sounds far too late in the lifecycle.
Another familiar example of attestation is things like code review, in projects that require all changes to be approved by one (or perhaps more) reviewers. By approving a PR, the reviewer "attests" that the change looks good to them, that it will implement the stated requirements and doesn't have any obvious flaws, etc. High quality code review is incredibly valuable, requires manual effort, but can be structured into sensible release and deployment processes, particularly by taking advantage of github / gitlab etc.
This reviewing and "attesting" activity can happen much earlier in the lifecycle than deployment or activation.
To deploy the entire system, maybe it'd take a few hours: there'd be 1-2 dozen services, a few databases, probably 20+ external integrations. Usually we'd only deploy the services or components that were changing. Deploying a single service would only take a few minutes, even with human in loop manually triggering the deploy scripts.
Deploying changes was decoupled from activating changes, to avoid outages due to deployments. There were two instances of the system running in production at all times, deployed to two datacentres. It was one giant monolithic blue-green deployment sitting behind a customer-facing load balancer. Suppose the "blue" prod system is currently the live instance of the system and "green" is the dark instance. You'd deploy your new release of your component to green, then once deployment was complete and seemed stable, someone would pull the big lever on the load balancer to start forwarding customer traffic to the green instance. For a while both prod instances would receive customer traffic, until all the timeouts for customer sessions being served by the blue instance kicked in, and they established new sessions with green. Then green would be live and blue would be fully dark. It'd usually take around 5 minutes or so to completely drain traffic from an instance.
If you saw error rates spike on some component and wanted to abort, then you'd need to jam the big metaphorical lever on the load balancer the other way to direct all the traffic back again. Might take 5 minutes or so, again governed by the client session timeouts designed into the system. Usually the technical speed wasn't the bottleneck -- it's more like it'd take 15 to 60 minutes to get the business stakeholders into a room to make a decision on if they were willing to live with the errors or wanted to roll back to the old version.
In this context the real bottleneck wasn't deployment or activation time, they were both fine. The bottleneck was on the pre-release test process in staging. There was a single staging environment for dozens of services owned and maintained by different teams, which would all be tested manually in lockstep. Changes had to be planned and coordinated weeks or months in advance, to get a test window. Releases happened every four weeks or so, if your change wasn't stable in time to enter the big heavy testing phase in the integrated staging environment, you missed the boat and you had to wait 4 weeks for another try.
Exactly as you say, there is a time window where both the old and new system write to the same data store. Both old and new systems, and the details of the deployment, need to be designed to tolerate this. Even if there is no change to the database schema, you need to think through what will happen if the old version of a component reads data written to the database by a newer version of that same component, or vice versa. Similar considerations if you need to roll back to the old version after the new version has run in production for a few hours, but the newly written data is still there. This can all be planned out and tested in staging.
I don't think this is unique to the blue / green deployment pattern. If you did a rolling deployment to upgrade app servers in a pool behind some customer-traffic facing load balancer, there would be a time window when both old and new versions of your app servers are all attached to your database. Same fundamental problem.
The good news is, we have an engineer putting in about 70% of his time on converting this to Terraform, and every time I do a deployment, it gets a little quicker and easier.
If no database migration, than maybe a minute or two (via GitHub actions).
If there is a database migration it might take a while depending on how much data needs to get moved around.
Adding columns usually takes no time. Adding indexes to a big table takes a few hours.
Their rebuild-from-clean documentation worked in 2 out of 3 environments...unfortunately production was the one that it didn't work in. At this point we're like fuck it.
At least it's not Hibernate(tm).
All told, maybe 30 minutes from writing code to running in production. That turnaround is extremely rare, only when an urgent hotfix is needed. Most releases are heavily tested in staging for several days before moving to production.
If you are in the company which has consultancy, consultants and freelancers and permanent employees - it can take ages because consultancy, consultants and freelancers are all competing each other while sucking the money from the companies. Every deployment in this case will result in a disaster because they never work with each other - they are just sucking the money and the permanent employee suffers and they start leaving for better options
If you are in the company which has talented permanent employees and good visionary leaders who work with the spirit of team building and team work I can assure you it will be as easy as pie - they work with each other and they make good results and take responsibility.
Piece of advice look for silos in your company and make them work as a team and it will take less time.
Development time can be more but deployment should always be easy if you have the right people. Try to hire right people for the right job
Deployment now a days is all cloud managed with in house or service providers like aws, google or azure and mostly offers serverless solutions as most of the databases and services are available as a service and a good DevOps would do it easily without taking much time provided if your team has people who do not like to work in silos and compete with each other
We have Dev, Test, & 10+ prod regions. The service I work on takes about an hour to run tests in each region, but that involves almost 10 years of test automation, building a custom AMI on EC2, and deploying. There is also cross region AMI copying which slows things down.
To bootstrap a new region takes about 2 months for the entire product with developers kinda working in the background. My service takes about a week worth of work, but lots of external dependencies and issues pop up. We do this about once a year so its' almost not worth optimizing for.
(note this is with Vercel's [proprietary] build cache, it takes longer when there's no build cache)
Lambda/Cloud Functions code: testing 1m-5m, deployment <3m. We use NX for our monorepo so we usually only deploy a fraction of all of our serverless code.
Containers: testing 5m-1h (depending on build time and type of tests), deployment 3m-10m.
Migrations: anywhere from 1m up to 1h depending on the tables/migration type and the number of affected PostgreSQL instances.
Infraestructure: anywhere from 2m up to 8h depending on what's being changed.
We can see in some of the other answers that comments assume wildly different meanings for this. From when code is pushed to it being available in production to how long it takes to start the service.
My experience include a lot of the whole spectrum. Back at a previous place, if you were unlucky, i.e. this is the start of the quarter and you made a small change that took you 5 minutes to make, you'd have to wait a quarter of a year minus 5 minutes until your change would be deployed to production during a weekend night in which the application would be taken offline (actually not fully offline, but read-only state w/ queues for write operations not delivering messages but taking them). This was for a large telecom provider's backend (ordering) systems.
To nowadays where the smallest service for our SaaS starts up in a second or two, depending on what you count. Does the cluster have room for the new pod? Yeah it's seconds. Does k8s think it needs to add a new node? Well you're gonna wait a bit. And yes some services take minutes to initialize. But no matter what, customers won't notice. Even if there's a database update that's included in the changeset that gets deployed and that runs hours per tenant (times thousands of tenants), the services will be available during that time. It has to be coded that way and be deployed in stages.
I would probably condense my definition to how quickly can you go from new code added to someone seeing that code running (in web dev terms essentially available on some URL). So even if you had different layers of pipeline, the build artifact stage would still be included, because you can't deploy without it.
The backend services take about two minutes.
(It's my own company, so I took the time to make deployments as fast as possible since I value short feedback loops)
- CI steps to build and package containers usually take 1-10 minutes, depending on whether caches are used
- running unit tests can take 1-5 minutes, depending on the system and infrastructure
- running integration tests can take 5-30 minutes, depending on the system and infrastructure
- scanning the build artifacts can take around 5 minutes (e.g. Trivy)
- uploading them to a container registry will usually take 1-5 minutes, depending on the network speed
- launching new containers will probably take 1-10 minutes, depending on whether there's DB migrations etc.
So, in short, typically under hour, sometimes a lot under an hour.Things that are especially useful on a technical level: a package cache (e.g. Maven ".m2" folder) or a self-hosted package repository (like Sonatype Nexus), maybe both; some sort of a build cache, which you largely get out of the box when working with containers, especially if you do multi-stage builds with an optimized build order/layers (e.g. first dependencies that change infrequently, then the code); a setup where you parallelize lots of the build steps and can add new runner/follower servers for actually doing the steps
The human aspects:
- the people delivering the software might not be the same ones running it, registering a release with instructions might take 30-60 minutes
- the people who will run the new version might need to change all of the necessary configuration for the new version, which might take another 30 or so minutes
- the people who will demand that a new version be launched on any given infrastructure might need to be made aware of the new release, which might take around 30 minutes
- before anything goes into prod or moves across different environments, testing and further fixes might be necessary, which can take from a day to a few weeks
This might be relevant to something closer to a consulting scenario, or when working across org units, but here is where you'll spend the majority of the time.Personally, I've been in scenarios where I've deployed new versions to prod in minutes, and I've seen cases where new releases of software haven't been deployed to prod in months, despite technically being delivered. Everything from fully automated pipelines, to shipping manually built binaries (thankfully this was years ago, for nothing important; I promptly setup proper CI/CD regardless).
Then again, the kinds of software that people work on might differ a lot. Here's an interesting post from a while ago: https://news.ycombinator.com/item?id=18442941
...right?