Interested to hear about current setups, and how it works for you.
1. https://github.com/gocd/gocd - 6.1k stars
2. https://github.com/Shopify/shipit-engine - 1.2k stars
3. https://github.com/guardian/riff-raff - 252 stars
4. https://github.com/ankyra/escape - 201 stars
5. https://github.com/kiwicom/crane - 92 stars
6. https://github.com/tim-group/orc - 34 stars
7. https://github.com/wballard/starphleet - 19 stars (dead?)
To be honest we tried to avoid the monorepo but it was hellish. Maybe if each microservices was larger and our team was larger but then are they microservices any more?
Giving up and dumping everything into a monorepo, that's not going to help at all. At that point probably better off just giving up any hope of carefully split up and individually managed services
Same with our consistent logging system.
Libraries are better than unique code everywhere for the same task - allows you to fix a bug once and to do consistency checking.
Wouldn't this urgent need mean that they put this code into the microservice that needs this urgent update as opposed to going through the effort to make it available for everyone to use?
Anyway, it sounds like you have a distributed monolith. If you cannot maintain and deploy a microservice independently, it should not be a microservice.
We can maintain and deploy them independently, but it was annoying to try to track which version was deployed where and having to check it out independently, etc.
The overhead was incredibly high. So we plopped them all into a single monorepo as sub projects. We can still update each one individually but we know what is live on the website is what is in the head of that branch.
As someone whose last website was a monolith (Clara.io), we do feel we are getting the benefits of micro services with little of their downsides now. It is like night and day.
It may be we have a lot of micro services for the size of our team - 20+ micro services and a team size of around 12.
One microservice per team, so you cut down on intra-team friction, and the team can manage their own releases.
(In other words, you're 100% spot on!)
This is comparable to CloudFormation or Terraform in terms of determining whether something is up-to-date, but more general purpose.
Short version: have DB1 hold the transactional data (data generated while running the system). Have DB2a have the release-bound data (data about and connected to the code itself-settings, prices, whatever).
Have DB2a have views onto DB1 tables. Version a code only “knows about” DB2a but any transactional CRUD ops hit the tables on DB1.
Now version b of the code just needs to ship/create a DB2b and both a and b can run in parallel.
If you need to change the shape of DB1 tables, those changes need to be backward compatible (can only add nullable columns, no use of "select *", etc).
There’s a few details about how to make it fully practical, but that’s the gist and we ran than for about 12 years on a moderately heavily trafficked e-commerce site.
This requires a little discipline, but if you follow a few simple rules it's not really that arduous:
- when adding a new column, it must have a default value set, or be nullable
- don't drop any columns
- don't rename any columns
Now, for those last 2, what I really mean is "don't do it in a single release" - if you want to make destructive changes, do it over the course of 2 releases. - release 1: remove dependencies on the column from the app/API/service
- release 2: performs the database migration with destructive changes
It probably sounds more difficult than it actually is :) In reality, I don't make destructive changes that often though.So ideally you have some kind of monitoring that reports/shows how many services are alive (and where they live in a cluster), how many errors they generate etc. Then based on some thresholds you can take them out of circulation and let them cool down. If certain kinds of errors occurs, or at a certain frequency, the system can notify a site reliability engineer (or equivalent) to check it out. Then they can decide if it should be permanently removed and to log an internal support ticket and so forth for the developers or product teams.
Production issues are a part of life. You need to have some visibility on issues and their severity. Every company and tech stack is different, also depending on their SLA's and uptime promises.
Ads not rendering in an app might be less severe than a pump failure at a fuel station, so they have different kinds of monitoring and and reaction times to faults. Obviously things like hospitals, banks, airlines/aircraft manufacturers have way different requirements and infrastructure from say a system that manages all school libraries for a state/province.
There are too many products and approaches to mention here if you were looking for a list of those. I have one or two favorite approaches and a handful of tools for this kind of stuff, half of which is homemade, so not something you can google. But you can google it and see a few different approaches. "microservices monitoring java" or "microservices monitoring best practice" or something along those lines will get you on a path. Try to find 5 different approaches and reflect what each one is missing or how they may help you, and then ponder what would you like to see from a reporting system with hundreds/thousands of services.
And then obviously the the best lessons will come from production itself.
Good luck!
Luckily, a good CI/CD pipeline makes reversions just as easy as deployments. So even when you have errors, it's easier to correct than if you suddenly discovered "our deployment bash script / ansible playbook isn't as reversible as we thought it was"
The idea of deploying every commit all the way to prod is is very questionable.
That said, to know what changes would actually break things you'd ideally have a suite of tests.
If only you could tell my bosses/architects that. They won't listen to me.
Edit: why downvote?
Just because you should be able to release without orchestration doesn't mean you shouldn't be able to watch and track things.
You shouldn't have frequent breaking changes but you should still have the tools to manage when you do.
Then leave and go somewhere where they will. I wasted too much of my life trying to "change things from within", but I finally learned the lesson. If you have no authority but are held accountable, then GTFO.
my theory is your presentation is not compelling. Was your CBA clear? What risk/reward metrics did you highlight?
True, but you absolutely should still be versioning/tagging your releases for each service. It's not to provide sophisticated orchestration; but just to know each of your releases and be able to roll back to them.
Also I'll point out that some loose coupling between services is unavoidable even in the best case scenario. Sometimes breaking changes happen, or new features need to be taken advantage of. This necessitates some level of (perhaps ad-hoc) "orchestration." If you add a new feature to a microservice and rely on it elsewhere, there's an implicit dependency to that version (or later) of the microservice now.
I've been looking at Sentry for this, recently. They have a specific feature for tracking releases (and even relating them to errors vs. commits) which looks very interesting. Haven't tried it yet though.
As it stands, with what I've seen and heard about microservices, I'd say the best way to deal with micro service anything is to use a monolith 90% of the time and for the rest of the time make sure your micro service could stand as it's own SAAS if given enough love.
Not a direct solution to your problem but might be an indirect one.
When a network call is involved, never.
The somewhat more modern way with Kubernetes deployments is the Helm "chart of charts" pattern, where your system level deployment is a single Helm chart that does nothing but pull in other charts, specifying the required semantic version of each sub-chart in the values.yaml file.
The older, but also much more flexible way I've seen it done is through something a local system architect developed a while back that he called a "metamodule." This was back when Apache Ivy was a preferred means of dependency management when Apache Ant was still a popular build tool and microservices were being deployed as Java OSGi components. Ivy defines a coordinate to uniquely identify a software dependency by organization, module, and revision. So a metamodule was just a module, but like the chart of charts, it doesn't define an actual software component, but rather a top-level grouping of other modules. Apache Ivy is significantly more flexible than Helm, however, allowing you to define version ranges, custom conflict managers, and even multiple dependencies that globally conflict but can be locally reconciled as long as the respective downstreams don't actually interact with each other.
Be aware both of these systems were for defense and intelligence applications. Personally, I would just recommend trunk based development and fail fast in production for most consumer applications, but for things that are safety or mission critical, you can't do that and may have very stringent pre-release testing and demonstration requirements and formal customer acceptance before you can release anything at all into ops, in which case you need the more complicated dependency management schemes to be able to use microservices.
Arguably, in this case, the simplest thing to do from the developer's perspective is don't use microservices and do everything as a monorepo instead, but government and other enterprise applications usually don't want to operate this way because of being burned so much in the past by single-vendor solutions. It's not totally impossible to have a monorepo with multiple vendors, but it's certainly a lot harder when they tend to all want to keep secrets from each other and have locally incompatible standards and practices and no direct authority over each other.
All of our of microservices have deployment charts, with frozen image versioning. That way, we can can rollout a whole release knowing they are all compatible with each other and can easily fall back just by using git rollback.
CI/CD updates image versions in affected YAMLs on every backend release and Flux keeps staging in sync. When we are happy, we sync to production branch, Flux syncs and it's done.
If we spot an issue that we didn't see in staging, we either release a hotfix or rollback.
I've seen both advocated for, interested in what the consensus is.
Backend is a monorepo. I can easily check the commit history in gitops repo to see what was the state of backend when the release was made.
Nothing should be lost, we keep history of everything this way.
To elaborate:
- I do think there is value in "utility microservices". For example: a microservice to send email, a microservice to filter spam, etc. These are the next level libraries (because they do need to run as services 24/7). Management usually don't like these kind of microservices because these "domains" usually don't belong to any particular team, so managers cannot "own" their success.
- I don't think there's much value in building microservices for the core of your business (e.g., a checkout microservice, a payments microservice, etc.). The usual argument management gives is: "we'll make teams more independent and they will be able to delivery stuff faster than with a monolith!". While this is sometimes true, "faster software delivery" is not on my top list of prioritites when it comes to build software.
* build code
* run tests (unit + integration using database)
* build docker image
* push to gitlab registry
* deploy to staging k8s environment by using a custom image that just templates a .yml and does `kubectl apply` against the staging cluster
* optional extra "deploy to production" that works in the same way but is triggered with a manual button click in the pipeline.
I don't do canary deploys or anything. Just deploy to staging, and if it works, promote to production.For some projects I have "staging test scripts" which I can run from my devmachine or CI that check some common scenarios. The test scripts are mostly blackbox using an HTTP client to perform a series of requests and assert responses. (signup flow scenario for example)
I would like to move to a monorepo, but I have not yet figured out an easy way to have a separate pipeline for each service that is only triggered when that service has changed.
edit: formatting
Feel free to ask question or reach out :)
A team should own a microservice, you release as soon as the team able to.
You version your apis, so you don't break any services which rely on yours.
I agree, but in practice it seems more companies break it rather than follow it.
It's a totally impractical standard for modern software development, but the developers themselves have no choice in the matter until the customers change.
In the past, instead of canary, we used a staging environment with manual promotion. That was costing us a cool half a million in AWS overpriced machines (but we were committed to spend a certain amount of money per year in exchange for discounts, so it's hard to price things) and it was doubling the testing process (promote to staging, test, promote to prod, test). We have been bitten by issues happening in production and not in staging. With the canary, prod only approach we have higher risks of messing up with real data but we have safeguards in place and the canary approach means that a small portion of the users will see problems. We also have the option to deploy to a canary for devs only.
I'm not happy about using / running / maintaining jenkins (terrible UI, upgrade path, API to add plugins, etc) but it does the job and it improved a fair bit over the last 5 years. Jenkinsfile are especially nice, even though not being able to easily run them locally is a bit annoying.
For all-ways-on systems we have a simple dash-board that each service interacts with.
We don’t have a fancy CI/CD pipeline or anything like that, just a set of rules that you have to follow.
Database-wise a service has to register itself with one of our data-gatekeepers, which involves asking for permission for the exact data used with a reason. But beyond that services are rather free to make “add” changes, often in the forms of new tables that are linked with views. It’s not efficient, and we have a couple of cleanup scripts that check if anyone subscribed to all the data, but we’re not exactly Netflix, so the inefficiency is less expensive than doing something about it.
Founder of OpsLevel here (https://www.opslevel.com).
A lot of companies build their own internal microservice tracking tools. Not just for release/deployments, but also for tracking service owners and production readiness.
e.g., Shopify has ServicesDB ([1]) and Spotify has System-Z [2], which they recently open sourced as Backstage [3].
If you're down to build / maintain your own service catalog, those are good places to start.
We started OpsLevel a few years back because we saw a pretty clear need for a product in this space. OpsLevel tracks your services and their owners, production readiness of your services, and brings together lots of event/metadata about your services (including deploys).
There's been a lot of traction in this space over the last few years with a lot of new companies popping up. I'm glad to see some of our newer friends in the space chiming in this thread.
[1] - https://shopify.engineering/e-commerce-at-scale-inside-shopi...
[2] - https://dzone.com/articles/modeling-microservices-at-spotify...
[3] - https://backstage.io/
It helps that I have One Deployment Script To Rule Them All (or really, a couple DSTRTA's). When every service has its own special build & deploy script you have to ask nicely and hope people keep up with it. A lot of CI/CD systems force you into that corner because of an implicit assumption that each build & deploy is its own special one-off.
Anyhow, text files rule, at least as an ad-hoc solution.
Our production deployment jobs are in Jenkins and isolated. It's easy to check what was deployed when. We also have a script written that can run an environment report to see what versions and which microservices have been deployed. Along with their CPU/memory allocations, number of pods etc.
Release management tracks which JIRA stories are in which release, they do it mainly by looking at master merges between prod deployments.
Parent comment doesn't mention whether identification of versions is done manually or whether they just grab master. If the latter, it's probably reasonable. At $myclient, every release to stage and prod requires teams to manually identify each version of each microservice as well as the stories (JIRA tickets) that are being deployed. This is extremely painful, time-consuming, and error-prone. Avoid at all cost; as the number of services grows, the pain/time/error cost appears to increase geometrically.
Sorry, I'm just kidding but that's the only thing I could think off when I heard the number 200!
We have standardized pipeline models that we reuse everywhere. Service owners are responsible for updating their pipelines to pick up changes. As we mature, we're moving a lot of it into ci templates and key changes will be picked up automatically. There are a few pipelines that occasionally require manual steps but those are uncommon. As we add more continuous testing, we'll be deploying more frequently. Once we've gotten good at that, then we'll be working on a/b testing and/or feature flags.
https://www.altoros.com/blog/airbnb-deploys-125000-times-per...
Our community Discord Server (questions on DevOps and DataOps, not limited to Reliza Hub) - https://discord.gg/UTxjBf9juQ
The only way to accomplish what you're asking for would be extremely thorough mock testing.
This is a serious comment.