Ask HN: How Long Does a Deployment Take at Your Company

34 pointsdevtailz3y ago40 comments

I'm curious to hear what kinds of deployment times everyone works with and what system they are running on. I would guess most have the same times for a staging environment vs. production, but if those are different please specify.

For example, I have a Next.js project currently deployed via Vercel and it's about 40-60 seconds before becoming available. I have heard stories of several minutes to deploy and would love to hear what kind of spread there is.

34 pointsdevtailz3y ago40 comments

40 comments

39 comments · 28 top-level

throwaway1188993y ago· 5 in thread

Throwaway for obvious reason. But…

1-2 days. Multiple senior engineers cherry picking commits into a release branch with even more seniors doing atteststion. It’s a company-wide effort that happens every sprint. We have “staff” SREs who can’t figure out automated releases.

fragmede3y ago

Attestation makes it sound like a social, not technical blocker to automated releases.

shoo3y ago

Sometimes attestation can be involved if there are things like manual test or QA phases, and perhaps the QA manager needs to attest that the feature / release has passed some test plan. Sometimes it is possible to replace these "human attestation" process steps with automation, if the org is willing to invest. E.g. it may be possible to eliminate manual testing with automated testing, but it requires engineering effort to be allocated to it. Then perhaps "QA manager attesting the tests in the release's test plan all passed" could be replaced with an automated automated test suite pass / fail signal. There is value in having independent testing, and this could go off the rails if people don't design and then implement good test plans.

Sometimes there could also be attestation for things like "does the design and implementation of new service's architecture comply with security policy - does it have the approval of the security representative". some of these things can be automated away with tooling -- e.g. install a tool to check for glaring container security vulnerabilities that runs at code review time, and block changes from being merged or being deployed into dev or staging environments if there are high or critical sev security issues, with some manual process to override the inevitable false positives when the tooling makes a silly decision.

All that said, it isn't necessarily practical or cost effective to attempt to automate everything. Skilled human expert review of proposed designs and implementations for security flaws will likely do a much, much more effective job than relying on tools alone. So it's quite reasonable for designs and implementation to require independent security review and approval (or "attestation") in contexts where security is critical. But if things like security review of implementation (let alone design) is being done while cherry-picking commits onto a release branch, that sounds far too late in the lifecycle.

Another familiar example of attestation is things like code review, in projects that require all changes to be approved by one (or perhaps more) reviewers. By approving a PR, the reviewer "attests" that the change looks good to them, that it will implement the stated requirements and doesn't have any obvious flaws, etc. High quality code review is incredibly valuable, requires manual effort, but can be structured into sensible release and deployment processes, particularly by taking advantage of github / gitlab etc.

This reviewing and "attesting" activity can happen much earlier in the lifecycle than deployment or activation.

devtailzOP3y ago

Oh...wow. Thank you for sharing, that does not sound like fun.

pitsnatch3y ago

That sounds hilarious

pprotas3y ago

:( this makes me sad

shoo3y ago· 2 in thread

I worked on a little web service in a much larger system, that was deployed to the company's on-prem datacentres.

To deploy the entire system, maybe it'd take a few hours: there'd be 1-2 dozen services, a few databases, probably 20+ external integrations. Usually we'd only deploy the services or components that were changing. Deploying a single service would only take a few minutes, even with human in loop manually triggering the deploy scripts.

Deploying changes was decoupled from activating changes, to avoid outages due to deployments. There were two instances of the system running in production at all times, deployed to two datacentres. It was one giant monolithic blue-green deployment sitting behind a customer-facing load balancer. Suppose the "blue" prod system is currently the live instance of the system and "green" is the dark instance. You'd deploy your new release of your component to green, then once deployment was complete and seemed stable, someone would pull the big lever on the load balancer to start forwarding customer traffic to the green instance. For a while both prod instances would receive customer traffic, until all the timeouts for customer sessions being served by the blue instance kicked in, and they established new sessions with green. Then green would be live and blue would be fully dark. It'd usually take around 5 minutes or so to completely drain traffic from an instance.

If you saw error rates spike on some component and wanted to abort, then you'd need to jam the big metaphorical lever on the load balancer the other way to direct all the traffic back again. Might take 5 minutes or so, again governed by the client session timeouts designed into the system. Usually the technical speed wasn't the bottleneck -- it's more like it'd take 15 to 60 minutes to get the business stakeholders into a room to make a decision on if they were willing to live with the errors or wanted to roll back to the old version.

In this context the real bottleneck wasn't deployment or activation time, they were both fine. The bottleneck was on the pre-release test process in staging. There was a single staging environment for dozens of services owned and maintained by different teams, which would all be tested manually in lockstep. Changes had to be planned and coordinated weeks or months in advance, to get a test window. Releases happened every four weeks or so, if your change wasn't stable in time to enter the big heavy testing phase in the integrated staging environment, you missed the boat and you had to wait 4 weeks for another try.

rvdginste3y ago

How do you handle database/data store upgrades? It seems there is a window where both the old and new system write into the same data store.

shoo3y ago

Great question. I don't know the details of how database schema changes were deployed, I only worked on the easy stuff - stateless web services.

Exactly as you say, there is a time window where both the old and new system write to the same data store. Both old and new systems, and the details of the deployment, need to be designed to tolerate this. Even if there is no change to the database schema, you need to think through what will happen if the old version of a component reads data written to the database by a newer version of that same component, or vice versa. Similar considerations if you need to roll back to the old version after the new version has run in production for a few hours, but the newly written data is still there. This can all be planned out and tested in staging.

I don't think this is unique to the blue / green deployment pattern. If you did a rolling deployment to upgrade app servers in a pool behind some customer-traffic facing load balancer, there would be a time window when both old and new versions of your app servers are all attached to your database. Same fundamental problem.

1 more reply

manv13y ago· 2 in thread

Depends if there's a database migration or not.

If no database migration, than maybe a minute or two (via GitHub actions).

If there is a database migration it might take a while depending on how much data needs to get moved around.

Adding columns usually takes no time. Adding indexes to a big table takes a few hours.

manv13y ago

Oh, and if prisma fucks up it takes hours...but that doesn't happen since we just started pushing the prisma schema directly.

Their rebuild-from-clean documentation worked in 2 out of 3 environments...unfortunately production was the one that it didn't work in. At this point we're like fuck it.

At least it's not Hibernate(tm).

aprdm3y ago

Do you have some good articles on database migrations in production ? I've had some bad experiences, e.g: a delete locking for a long time and no one being able to use it :D

ElevenLathe3y ago· 1 in thread

Depending on /what/ is being deployed, somewhere between "seconds" and "it's literally impossible so let's hope we don't have to".

tharkun__3y ago

I would even add "Depending on what you mean by 'deployment'".

We can see in some of the other answers that comments assume wildly different meanings for this. From when code is pushed to it being available in production to how long it takes to start the service.

My experience include a lot of the whole spectrum. Back at a previous place, if you were unlucky, i.e. this is the start of the quarter and you made a small change that took you 5 minutes to make, you'd have to wait a quarter of a year minus 5 minutes until your change would be deployed to production during a weekend night in which the application would be taken offline (actually not fully offline, but read-only state w/ queues for write operations not delivering messages but taking them). This was for a large telecom provider's backend (ordering) systems.

To nowadays where the smallest service for our SaaS starts up in a second or two, depending on what you count. Does the cluster have room for the new pod? Yeah it's seconds. Does k8s think it needs to add a new node? Well you're gonna wait a bit. And yes some services take minutes to initialize. But no matter what, customers won't notice. Even if there's a database update that's included in the changeset that gets deployed and that runs hours per tenant (times thousands of tenants), the services will be available during that time. It has to be coded that way and be deployed in stages.

quickthrower23y ago· 1 in thread

It would be good to distinguish build-and-deploy which Vercel does by default to other pipelines where you might build the artefacts, test them, then the deploy is more of a copying of those artefacts into production. In the latter case the “deploy” would be faster.

devtailzOP3y ago

I don't think I can edit the post, but I agree. Though many have kind of described their process which gives somewhat of a picture. If anything it's been interesting hearing different people's definitions of deployment.

I would probably condense my definition to how quickly can you go from new code added to someone seeing that code running (in web dev terms essentially available on some URL). So even if you had different layers of pipeline, the build artifact stage would still be included, because you can't deploy without it.

ElfinTrousers3y ago

Maybe an hour...if you get it right the first time. There's nothing especially odd about the architecture of our app, but the person who used to have my job was the president of the Not Invented Here Syndrome Fan Club, and our deployment consists of a set of handrolled Python scripts and some partial documentation on how to use them. This is one reason, and far from the biggest, that his old job is no longer his.

The good news is, we have an engineer putting in about 70% of his time on converting this to Terraform, and every time I do a deployment, it gets a little quicker and easier.

theandrewbailey3y ago

(FYI: I work with several clients on Salesforce Commerce Cloud, each with their own codebases and instances.) A deployment to staging involves a build that usually takes 5 or so minutes. Depending on the size of the codebase (~50 to ~200 MB), a deployment from staging to production takes 1 to 5 minutes, followed by a cache invalidation that takes up to 15 minutes.

All told, maybe 30 minutes from writing code to running in production. That turnaround is extremely rare, only when an urgent hotfix is needed. Most releases are heavily tested in staging for several days before moving to production.

anacrolix3y ago

The longest part is getting Rust to build inside Docker (even cached warm builds take forever), and fly.io to figure its shit out when the image is pushed. Both are needlessly slow bottlenecks.

Rev359a3y ago

It depends on country and companies.

If you are in the company which has consultancy, consultants and freelancers and permanent employees - it can take ages because consultancy, consultants and freelancers are all competing each other while sucking the money from the companies. Every deployment in this case will result in a disaster because they never work with each other - they are just sucking the money and the permanent employee suffers and they start leaving for better options

If you are in the company which has talented permanent employees and good visionary leaders who work with the spirit of team building and team work I can assure you it will be as easy as pie - they work with each other and they make good results and take responsibility.

Piece of advice look for silos in your company and make them work as a team and it will take less time.

Development time can be more but deployment should always be easy if you have the right people. Try to hire right people for the right job

Deployment now a days is all cloud managed with in house or service providers like aws, google or azure and mostly offers serverless solutions as most of the databases and services are available as a service and a good DevOps would do it easily without taking much time provided if your team has people who do not like to work in silos and compete with each other

liampulles3y ago

About 20 minutes. Main time cost is running integration and acceptance tests, downloading libs, and compiling. Hopefully will start doing it on a dedicated VM to cache things a bit.

iends3y ago

I work on a large enterprise SaaS. It depends on how long a services test take, but our deploys to production require approval by the security team, which all happens via JIRA approvals. So less than a day. If it's break the glass urgent in under an hour, mostly constrained by test automation. If it was something insane, I could probably do it in under 10 minutes for a single amazon region by turning off the test automation, but never had to do this. Feature toggles only take about 5 minutes to propagate and much less work than doing a deploy.

We have Dev, Test, & 10+ prod regions. The service I work on takes about an hour to run tests in each region, but that involves almost 10 years of test automation, building a custom AMI on EC2, and deploying. There is also cross region AMI copying which slows things down.

To bootstrap a new region takes about 2 months for the entire product with developers kinda working in the background. My service takes about a week worth of work, but lots of external dependencies and issues pop up. We do this about once a year so its' almost not worth optimizing for.

vixalien3y ago

I have a Next.js app too, and it takes about 1m30 to deploy. it really takes longer to build when your app gets larger, and if I'd had to rebuilt the app focusing on build time, I'd probably go with Remix

(note this is with Vercel's [proprietary] build cache, it takes longer when there's no build cache)

nevon3y ago

Around 8-10 minutes at minimum. We do blue-green canary deployments, so most of the time the full deployment would take longer to shift over traffic in order to have enough time for metrics to be evaluated. But for a yolo deploy it'd be about 8-10 minutes.

aviperl3y ago

I have an app that takes 15 minutes to deploy. We use GitHub actions to run the process which includes a build of a next.js site, building containers, creating assets in AWS, installing Python dependencies, and publishing the whole thing.

gfarah3y ago

As other have mentioned depending on what is being deployed.

Lambda/Cloud Functions code: testing 1m-5m, deployment <3m. We use NX for our monorepo so we usually only deploy a fraction of all of our serverless code.

Containers: testing 5m-1h (depending on build time and type of tests), deployment 3m-10m.

Migrations: anywhere from 1m up to 1h depending on the tables/migration type and the number of affected PostgreSQL instances.

Infraestructure: anywhere from 2m up to 8h depending on what's being changed.

onelli3y ago

I'm not a CTO or tech guy, but in our case deployment takes a few minutes (usually up to 5 mins, rarely more). Node.js on meteor

ktrnka3y ago

If I remember right, my team's pipelines took about 40-60 min on average. We deployed machine learning models and code to lambda via cdk. The things that made it slow were having to deploy dev staging and prod with no resource sharing between them, docker builds and uploads, and end to end tests.

Oldham-Made3y ago

About 2 mins, most of that time is remote compile. Small team, we all have deploy ability, and it is a manual trigger of an "automated" deploy. CI but not quite CD just yet.

rozenmd3y ago

OnlineOrNot's frontend deploys in about 60 seconds if the cache is warm, two and a half minutes otherwise.

The backend services take about two minutes.

(It's my own company, so I took the time to make deployments as fast as possible since I value short feedback loops)

mind-blight3y ago

About 20 minutes for staging and 30 for production. We deploy to staging and verify before we deploy to production. We have CD set up, so we deploy to production around 10 times a day

wagslane3y ago

I have a decoupled front and back end. Front end takes less than a minute, back end take 5 minutes. Biggest time sink is a rebuilding the docker image, it doesn't cache super well.

contextnavidad3y ago

From the deployment stage of the pipeline to live traffic hitting the new code, probably a couple of minutes. Limited by image pulls, application startup time and K8s rollout strategy.

stcroixx3y ago

20 seconds maybe? However long it takes Tomcat to start.

Bjartr3y ago

If tests have already been run on the branch, the deploy takes less than ten minutes. Sometimes a db migration can stretch that a little though.

rurban3y ago

From git push to fly.io restart its docker about a minute. It's not a firecracker I assume.

KronisLV3y ago

The technical aspects:

  - CI steps to build and package containers usually take 1-10 minutes, depending on whether caches are used
  - running unit tests can take 1-5 minutes, depending on the system and infrastructure
  - running integration tests can take 5-30 minutes, depending on the system and infrastructure
  - scanning the build artifacts can take around 5 minutes (e.g. Trivy)
  - uploading them to a container registry will usually take 1-5 minutes, depending on the network speed
  - launching new containers will probably take 1-10 minutes, depending on whether there's DB migrations etc.

So, in short, typically under hour, sometimes a lot under an hour.

Things that are especially useful on a technical level: a package cache (e.g. Maven ".m2" folder) or a self-hosted package repository (like Sonatype Nexus), maybe both; some sort of a build cache, which you largely get out of the box when working with containers, especially if you do multi-stage builds with an optimized build order/layers (e.g. first dependencies that change infrequently, then the code); a setup where you parallelize lots of the build steps and can add new runner/follower servers for actually doing the steps

The human aspects:

  - the people delivering the software might not be the same ones running it, registering a release with instructions might take 30-60 minutes
  - the people who will run the new version might need to change all of the necessary configuration for the new version, which might take another 30 or so minutes
  - the people who will demand that a new version be launched on any given infrastructure might need to be made aware of the new release, which might take around 30 minutes
  - before anything goes into prod or moves across different environments, testing and further fixes might be necessary, which can take from a day to a few weeks

This might be relevant to something closer to a consulting scenario, or when working across org units, but here is where you'll spend the majority of the time.

Personally, I've been in scenarios where I've deployed new versions to prod in minutes, and I've seen cases where new releases of software haven't been deployed to prod in months, despite technically being delivered. Everything from fully automated pipelines, to shipping manually built binaries (thankfully this was years ago, for nothing important; I promptly setup proper CI/CD regardless).

Then again, the kinds of software that people work on might differ a lot. Here's an interesting post from a while ago: https://news.ycombinator.com/item?id=18442941

fragmede3y ago

Y'all are running automated testing before each deploy, right?

...right?

faangiq3y ago

Hahahhahaahahah

j / k navigate · click thread line to collapse

40 comments

39 comments · 28 top-level

throwaway1188993y ago· 5 in thread

Throwaway for obvious reason. But…

fragmede3y ago

Attestation makes it sound like a social, not technical blocker to automated releases.

shoo3y ago

This reviewing and "attesting" activity can happen much earlier in the lifecycle than deployment or activation.

devtailzOP3y ago

Oh...wow. Thank you for sharing, that does not sound like fun.

pitsnatch3y ago

That sounds hilarious

pprotas3y ago

:( this makes me sad

shoo3y ago· 2 in thread

I worked on a little web service in a much larger system, that was deployed to the company's on-prem datacentres.

rvdginste3y ago

How do you handle database/data store upgrades? It seems there is a window where both the old and new system write into the same data store.

shoo3y ago

Great question. I don't know the details of how database schema changes were deployed, I only worked on the easy stuff - stateless web services.

1 more reply

manv13y ago· 2 in thread

Depends if there's a database migration or not.

If no database migration, than maybe a minute or two (via GitHub actions).

If there is a database migration it might take a while depending on how much data needs to get moved around.

Adding columns usually takes no time. Adding indexes to a big table takes a few hours.

manv13y ago

Oh, and if prisma fucks up it takes hours...but that doesn't happen since we just started pushing the prisma schema directly.

Their rebuild-from-clean documentation worked in 2 out of 3 environments...unfortunately production was the one that it didn't work in. At this point we're like fuck it.

At least it's not Hibernate(tm).

aprdm3y ago

Do you have some good articles on database migrations in production ? I've had some bad experiences, e.g: a delete locking for a long time and no one being able to use it :D

ElevenLathe3y ago· 1 in thread

Depending on /what/ is being deployed, somewhere between "seconds" and "it's literally impossible so let's hope we don't have to".

tharkun__3y ago

I would even add "Depending on what you mean by 'deployment'".

quickthrower23y ago· 1 in thread

devtailzOP3y ago

ElfinTrousers3y ago

The good news is, we have an engineer putting in about 70% of his time on converting this to Terraform, and every time I do a deployment, it gets a little quicker and easier.

theandrewbailey3y ago

anacrolix3y ago

The longest part is getting Rust to build inside Docker (even cached warm builds take forever), and fly.io to figure its shit out when the image is pushed. Both are needlessly slow bottlenecks.

Rev359a3y ago

It depends on country and companies.

Piece of advice look for silos in your company and make them work as a team and it will take less time.

Development time can be more but deployment should always be easy if you have the right people. Try to hire right people for the right job

liampulles3y ago

About 20 minutes. Main time cost is running integration and acceptance tests, downloading libs, and compiling. Hopefully will start doing it on a dedicated VM to cache things a bit.

iends3y ago

vixalien3y ago

(note this is with Vercel's [proprietary] build cache, it takes longer when there's no build cache)

nevon3y ago

aviperl3y ago

gfarah3y ago

As other have mentioned depending on what is being deployed.

Lambda/Cloud Functions code: testing 1m-5m, deployment <3m. We use NX for our monorepo so we usually only deploy a fraction of all of our serverless code.

Containers: testing 5m-1h (depending on build time and type of tests), deployment 3m-10m.

Migrations: anywhere from 1m up to 1h depending on the tables/migration type and the number of affected PostgreSQL instances.

Infraestructure: anywhere from 2m up to 8h depending on what's being changed.

onelli3y ago

I'm not a CTO or tech guy, but in our case deployment takes a few minutes (usually up to 5 mins, rarely more). Node.js on meteor

ktrnka3y ago

Oldham-Made3y ago

About 2 mins, most of that time is remote compile. Small team, we all have deploy ability, and it is a manual trigger of an "automated" deploy. CI but not quite CD just yet.

rozenmd3y ago

OnlineOrNot's frontend deploys in about 60 seconds if the cache is warm, two and a half minutes otherwise.

The backend services take about two minutes.

(It's my own company, so I took the time to make deployments as fast as possible since I value short feedback loops)

mind-blight3y ago

About 20 minutes for staging and 30 for production. We deploy to staging and verify before we deploy to production. We have CD set up, so we deploy to production around 10 times a day

wagslane3y ago

I have a decoupled front and back end. Front end takes less than a minute, back end take 5 minutes. Biggest time sink is a rebuilding the docker image, it doesn't cache super well.

contextnavidad3y ago

From the deployment stage of the pipeline to live traffic hitting the new code, probably a couple of minutes. Limited by image pulls, application startup time and K8s rollout strategy.

stcroixx3y ago

20 seconds maybe? However long it takes Tomcat to start.

Bjartr3y ago

If tests have already been run on the branch, the deploy takes less than ten minutes. Sometimes a db migration can stretch that a little though.

rurban3y ago

From git push to fly.io restart its docker about a minute. It's not a firecracker I assume.

KronisLV3y ago

The technical aspects:

  - CI steps to build and package containers usually take 1-10 minutes, depending on whether caches are used
  - running unit tests can take 1-5 minutes, depending on the system and infrastructure
  - running integration tests can take 5-30 minutes, depending on the system and infrastructure
  - scanning the build artifacts can take around 5 minutes (e.g. Trivy)
  - uploading them to a container registry will usually take 1-5 minutes, depending on the network speed
  - launching new containers will probably take 1-10 minutes, depending on whether there's DB migrations etc.

So, in short, typically under hour, sometimes a lot under an hour.

The human aspects:

  - the people delivering the software might not be the same ones running it, registering a release with instructions might take 30-60 minutes
  - the people who will run the new version might need to change all of the necessary configuration for the new version, which might take another 30 or so minutes
  - the people who will demand that a new version be launched on any given infrastructure might need to be made aware of the new release, which might take around 30 minutes
  - before anything goes into prod or moves across different environments, testing and further fixes might be necessary, which can take from a day to a few weeks

This might be relevant to something closer to a consulting scenario, or when working across org units, but here is where you'll spend the majority of the time.

Then again, the kinds of software that people work on might differ a lot. Here's an interesting post from a while ago: https://news.ycombinator.com/item?id=18442941

fragmede3y ago

Y'all are running automated testing before each deploy, right?

...right?

faangiq3y ago

Hahahhahaahahah

j / k navigate · click thread line to collapse