I am know very little about Python, but does this mean that Python has no way to encapsulate code at a level larger than a class? Something like a package or a module. It does not seem like it should be necessary to break a system into separate services just to get encapsulation at a module of subsystem level.
Granted, you shouldn't do very much of that except in some extremely specialized and careful circumstances, but in a language that makes it really easy, you have to be extremely disciplined and often delivery velocity takes precedence over careful planning of what really constitutes a public API.
Practically speaking, putting separate services in literally separate process spaces where the only way they can communicate is via message passing is the only way to really enforce encapsulation.
The language designers definitely took the path of we're just going to assume you know what you're doing and give you absolutely every last foot of rope you could possibly want to hang yourself with.
It does make things fast and easy. See all of the complaints here lately about having to rewrite significant parts of an application to become async as soon as any part of it becomes async? Not a problem in Python. There's a library to just automatically rewrite all code as it is being imported, at the cost that the code you're actually running is not the code you see in your repos.
Python has packages, which contain modules. This doesn't seem to be a reference to the lack of encapsulation, in any case.
Though it seems to be projecting a social problem (absence of a policy involving active decisions on when particular code can be used by endpoints) onto architecture.
import foo
foo.auth("admin", "badpass") returns False
def new_auth(user, pass):
. . . .return True
foo.auth = new_auth
foo.auth("admin", "badpass") now returns True
For example, you are updating service B to call new endpoint on service A. First you need to make service A endpoint available, and then make service B call service A.
Just because everything exists in the same repo does not mean it all gets shoved out at once. The downside is that you can't just read the code and assume the running service is doing that, unless it's embedded in your build. Processes like automated updates and a forced update cadence (no running binaries over X days old) with proper canary/vetting before a full release allow a large org to still manage this complexity.
If we were doing continuous delivery, too, I could see there not being much value in messing with independent versioning, semver, whatever. Just make today's date the universal version numbering system for all modules and move along.
If you have a specific version per subproject, how do you track that in the repo? Different tag schemes for different subprojects? I have used that in a small-ish monorepo and I didn't like it specially.
"We evaluated using off-the-shelf solutions to run the platform. But in order to de-risk our migration and ensure low engineering costs, it made sense for us to continue hosting services on the same deployment orchestration platform used by the rest of Dropbox."
It sounds like they acknowledge they're reinventing a lot of stuff but for now are sticking to their internal platform. Perhaps Atlas is a half-step then to get teams used to owning and running their code as isolated services. But everything I read that they built in Atlas--isolated orchestrated services, gRPC load balancing, canary deployments, horizontal scaling, etc.--are bog standard features of Kubernetes today. I'd be very leery of maintaining a bespoke Kubernetes-like platform in 2021 and beyond--in some ways it seems like it's just shifting the monolith technical debt into an internal Atlas platform team's technical debt. What's the plan to get rid of that debt for good I wonder?
This hurdle shows there's already some cracks in the idea of long-term Atlas too:
"While splitting up Metaserver had wins in production, it was infeasible to spin up 200+ Python processes in our integration testing framework. We decided to merge the processes back into a monolith for local development and testing purposes. We also built heavy integration with our Bazel rules, so that the merging happens behind the scene and developers can reference Atlasservlets as regular services."
If I read that right does it really mean the first time a developer's code is run like it will run in production is when it goes out to canary deployment? I.e. integration tests are done in a local monolith instead of setting up a mini-prod cluster. That seems a bit nerve-racking as a dev to have no way to really test the service until bits are hitting user requests. In the k8s world a ton of work has been put into tooling and processes to make setting up local clusters easily. It's a shame to not have something similar for Atlas.
> in some ways it seems like it's just shifting the monolith technical debt into an internal Atlas platform team's technical debt.
This is a key insight into the monolith problem. How does a monolith become poor and unmaintainable? A monolith becomes poor quality and unmaintainable when there is no entity enforcing architectural simplicity. It becomes unmaintainable when there is no team focusing solely on how the monolith functions. It becomes unmaintainable if there is no entity capable of saying "no" to a product engineer. A monolith in a company with weak leadership is a tragedy of the commons where everyone takes from the commons by adding complexity and there is no governing entity to ensure that the commons remains viable.
The exact statement you made is the key strength of this approach. Where there was a vacuum of responsibility before (monolith technical debt), a team has been created with direct responsibility and authority creating a governing force over that technical debt/overall complexity and therefore an entity directly responsible for improving it. This is a key first step. Atlas appears to be a compromise solution rather than an ideal end state.
Having worked in a company where no single team "owned" the monolith, the term "communally owned" tended to come up.
It was generally understood within the platform teams that if everyone owns it then in reality, no one owns it :)
"Metaserver was stuck on a deprecated legacy framework that unsurprisingly had poor performance and caused maintenance headaches due to esoteric bugs. For example, the legacy framework only supports HTTP/1.0 while modern libraries have moved to HTTP/1.1 as the minimum version."
Dropbox has been around for a lot of years, and raised a lot of cash; was it only recently that they could pay down this technical debt? Were they really so busy in other areas that this was allowed to fester?
The legacy framework was Pylons, which eventually evolved into Pyramid.
The tldr is there were hundreds of unowned endpoints that, yes, were allowed to fester. They eventually got ownership on all endpoints, so you had somebody to exert pressure on to make things happen.
https://dropbox.tech/infrastructure/rewriting-the-heart-of-o...
Then I don’t understand the delay to shipping an Apple Silicon build. Right now we still have to used Rosetta... it’s the only such piece of software I have that does.
Does this imply that the atlas team gets into the weeds of understanding the business and business logic behind these endpoints to know the scalability and throughput needs? Is the autoscaler really good enough to handle this? If it's transparent to the product team, are they aware of their usage (potentially unexpected)? I imagine the atlas team would have to be very large with these sorts of responsibilities.
From a product team perspective I imagine they are still responsible for database configuration and tuning? Has the daily auto-deployment led to unexpected breaks? Who is responsible for rollbacks? And is the product team responsible and capable of hotfixes?
Maybe a more broad question which all of my questions above speak to: how are the roles and responsibilities set up between the atlas team and the product engineering team that owns the code, and how has the transition to that system been?
So... about this headline. I read this aloud to a friend at a cafe. We laughed. It makes perfect sense to us. We know what Python is. We know what a monolith means in this context.
To my other friends it was the funniest / silliest / nonsensical thing they'd heard for awhile.
IT is weird.
(ps I know no one will see this comment but I'll leave it here. Because.)