I always wonder why some (most) banks are proud of being reckless.. oh well, it keeps me well paid.
Also, Monzo decided to remove the "dark mode" option back-in-the-day. When I wrote to them about it "please return it as optional - as it already was" they responded with a polite "nope, suck it up". My next message to them was to close my account. Well.. "nope, suck it up" back right at you.
Anyway, since I don't like them, one may say that I'm negatively biased, but still, how does their Audit Universe look like?
But it all stills very amateur-ish, especially for a bank. Something as simple as being able to generate a proof of payment receipt for a bank transfer, why is this not possible? It feels incredibly unprofessional to send a screenshot of a mobile app to a company because your bank doesn't allow you to properly export a PDF for one single transaction.
Those are only backend microservices, not counting data pipelines, and other supporting applications.
What constitutes a microservice is a philosophical question very related to the company's culture, some companies will prefer very small single use-case services, some will develop microservices that support a whole isolated functionality (with business logic) to be re-used across the stack. There's no single definition that can be applicable to such variety of architectures.
Doesn't wrapping the old library require a lot of effort to update all call sites?
If this is supposed to be general advice about libraries... does this mean wrap all libraries? Does not sound like a good idea to me.
However for some of the core platform functionality depended on by all services, we've seen a lot of value in wrapping because it:
- We can provide a more opinionated interface (a third party library is typically more flexible because it has vastly more variety in its use cases) - We can hook in our own instrumentation easily (e.g. metrics, logging) or change functionality with config/feature flags - We can transparently change the implementation in the future
The most important parts of such a system (the ones mentioned in this post, anyway) don't get nearly enough attention:
- "centrally driven migrations": In any distributed service architecture, there are always too many interdependent pieces. You can't reliably touch thing A without also touching things B, C, D, etc. If you want any chance of automation or responding to failure without downtime, you must have a system which is aware of the changing state of everything and can change all the parts at a whim.
- "database migrations": This is again very complicated and depends on how your code and database are architected. You literally can't do migrations if your code and schema aren't set up right, and if you don't make the right kind of changes. How do you do this? Time to write a book...
- "wrap the old library": I can't remember what this is called, but it has a name. Anyway, the idea is hiding any change behind what is effectively a feature flag wrapper allows you to deploy the change without it being enabled, use the feature flag to test the change in production (on only one rest, on a percentage of requests, on one whole node/pod, etc), and then delete the old code eventually. This isn't just for features; you can replace entire interfaces, software stacks, whole systems this way, either piecemeal or entirely. Very powerful, but again, requires a specific approach not only in implementation but in use.
- "use automated rollback checks": What kind of checks? Checking what? In what way? At what time/stage? What happens when one fails? Do you do them in series or parallel? Can you do them in series or parallel? etc
- "deploy least critical services first": With enough interdependent services, you're going to hit cases where you have to upgrade parts B and C effectively simultaneously before you can upgrade A, etc. So for "no downtime", it will take a lot of coordination, and very explicit linkage and checking of specific new services, etc. There are ways to do this, but it's specific to your implementation and services, so this is another example of how you have to know exactly what's going on, and then set up the deployment to account for your specific dependency tree and how they react when they're run.
So many people I've run into don't think about any of these things. They literally say things like "automated rollbacks are easy, we did it at XYZ place", as if none of the above matter at all. They literally stick their head in the sand because they want to believe that it should be easy. But any engineer worth their salt will tell you that to do it correctly and reliably is bloody complicated.