> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.
Really, the one consistent thing is that if you have a good team, you'll make it work no matter what tech or decisions you make (assuming you're also good enough to know when you've lost and change course), and if you're a bad team, you're doomed to failure, because, well, you're bad (by definition).
I think this article also vastly underestimates the cost and annoyance of the tooling of CI'ing a large number of repos, especially if you have to match or do some kind of cross product on the feature branches. (such as, repo A branch B can only be built with repo C branch F, but all the other repos should be master)
The middle ground is vast, and nuanced in many dimensions. Which is a good thing, because there sure aren't very many large, good teams (by this definition of "good").
I think the normal case though is that there are too many cooks, and they spoil the broth. I've had teams where one person wants to go off and make their own repo just because, and can't be convinced to follow the rest of the team. Sometimes these are good people, although I find a lot of "bad" people don't want to be team players and be consistent, even if being consistent means doing something they don't like.
This alone is pretty much what makes me prefer monorepos. If you don't have a stable interface for all of your in-house dependencies (and nobody does early on in a project), you're doomed to spend a ton of time matching branches like this. Not to mention, a naive build process of "grab the latest everything and build it" will break in that period of time when you've merged the feature branch in one repository but not the other.
Heh. This reminded me of a different story, which I remember vaguely enough that I'll paraphrase from memory:
> "The Excel team will never go for it. Their motto is 'Find the dependencies... and eliminate them.'"
> This probably explained why the Excel team had its own C compiler.
There is a little bit of a novel problem in correlating N feature branches and cloning them, but it's not that much more complicated than correlating N subprojects in a monorepo.
You don't want people favoring starting new work while someone else is flailing on old work. The fact that they started this two weeks ago indicates it was probably higher priority than whatever you might start today. If today's work is an emergency (eg, if we could go back in time, we'd have started this immediately) then sure. But barring extenuating circumstances, go help Paul. He's been staring at that code for weeks and making no progress.
I think having a rule for how many branches/forks whatever you want to call them, can exist at once might be a good idea. Every time another opportunity to use a branch comes up, the older reasons have to defend their continued existence. Having to explain yourself over and over is a form of positive peer pressure, if potentially a little passive-aggressive (solution: use an assertive person to be the messenger).
(responding to the article contents quoted in above comment)
Branching: monorepo or not, if a feature-incomplete development branch for one of the supported targets can "hold the entire organization as a hostage" then the SCM people, and/or persons responsible of the SCM policy, should do some introspection...
Why are deliveries done from a branch which is obviously still in development? Why does code-to-be-released need to depend on incomplete work? Why aren't something like "topic branches" used?
Modularity: monorepo or not, problems will certainly appear when the complexity of implementation outpaces the capacity created by the design. To get modularity, one needs actual modules with properly designed (=not brittle, DRY, KISS, YAGNI, SOLID, etc. etc.) interfaces between the modules. Now, does monorepo/multirepo really play a role here at all? If everyday changes are constantly modifying the module interfaces in incompatible ways which breaks existing code, this speaks something about the design, or rather the insufficiency of it.
Of course, every project and team is different. However, even if a locally optimal choice for the monorepo vs. multirepo question is found, problems existing regardless of monorepo/multirepo will still be there.
Formally speaking, multi-repo management allows a strict subset of the diffs allowed to a mono-repo (because diffs can 't extend beyond each repo root). Are the excluded possibilities all bad? No. Are they generally bad? Not really. Are they sometimes bad? Sure. Are they sometimes better than many diffs across many repos? Sure. Can a reasonably competent dev team tell the difference? Sure, usually. Unsurprisingly, this usually requires the exact same tooling as ensuring the quality of microrepo changes.
If you're continuously deploying master, have a healthy ci/cd pipeline, and enforce good merging discipline, you're fine either way.
I'm a little tired of doing things like revving our trace and logging libraries across our 50+ micro repos that represent microservices. That's genuinely obnoxious. Is it bad? No. Is it obviously more or less error prone than the equivalent monorepo update? No. All the bad bits of either strategy just require some tooling and a clear head.
But I'm not sure that's a useful feature anyway:
1) If you are doing a whole-repo refactor (one of the main atomic-commit benefits I see claimed), you still have to run on X -> try to commit X+1. If someone committed in between you may have to redo the whole thing. Or lock the whole monorepo while doing so. Both scenarios seem worse to me for mono, since microrepos stand far less of a chance of conflicting (less frequent commits, less code to consider (faster refactoring tool runs), etc) and a lock would be a far smaller interruption (one repo vs the whole company).
2) Atomic commits don't represent how things are deployed. You still have to deal with version N and N-1 simultaneously. So e.g. breaking refactors of RPC APIs have exactly the same problems in mono vs micro.
On the other hand, downsides are pretty clear and take immense work to sidestep: most tools will either be much slower or not work at all, because they now need to work on 100s or 1000s of times more data than they were developed against. That's probably thousands of man-years of tooling you may have to understand and improve, or wholly replace.
---
The vast majority of monorepo benefits that I usually see claimed are actually tool-standardization benefits. Or "we could build tool X to do that". Or top-level control, like "we can commit for team X". Of course that's useful! But it has nothing to do with monorepo vs microrepo.
Monorepo just happens to be the carrot/stick used to finally achieve standardization. Others could work, this is just the current fad (which, in some ways, is why it sometimes works - it's easier to convince others).
When the right boundaries reveal themselves, you can divide the code up. But who is to say those will still be the right ones in ten years?
If you divide the source code into separate repositories before getting the boundaries right, there's a tremendous amount of friction built into the system preventing the problem from being addressed. Each repository has its own actors, cycles, and version control history, and you break two of those when you start trying to move code across project boundaries. So people just hit things with a hammer or steal functionality (three modules with a function that ostensibly does the same thing but with different bugs).
One of the things I see over and over again is people conflating one repository with one lifecycle. One binary. It's possible to have a monorepo with multiple build artifacts. The first monorepo I ever worked on had 60 build artifacts, and it worked pretty well (the separate artifacts weeded out a lot of circular dependencies).
I can still get inter-version dependency sanity checks with a monorepo. When I am writing new code I can have everything talk to localhost (master@head) or I can have it talk to a shared dev cluster (last labelled version) or some of both, allowing me to test that I haven't created a situation where I can't deploy until I've already deployed.
Monorepos will not save a company from their lack of discipline. But while you can have problems if you do stupid things in a monorepo, you will always have to deal with the dependency hell and what come with it on multirepos.
We've spent a lot of time building and iterating a unified ci/cd environment to support the new repo. Previously each project had it's own test/deploy/build/publish story and usually it's own jenkins project. Now, each project is registered and triggers its own steps. Cross-project edits can happen in a single pull request. We have an incredible amount of integration tests (more so than unit tests), and getting them to work corss-project while migrating has been challenging.
We've gone from ~10-15 actively maintained repos to about 3 as we're slowly migrating. We have a mix of services, libraries, and batch processing all mixed in.
The authors points about forking and long-lived branching being incredibly difficult for most teams is really crucial. We're going to have to invest in education for new members about WHY we have a monorepo, what it means for your development, and how to change your perspective for developing at HEAD. I don't think 'bad' developers make it easier or harder. Instead, clearly articulating behaviors that exist in a poly-repo vs mono-repo world to developers is the Differentiator.
These articles were absolutely crucial to developing our monorepo.
https://trunkbaseddevelopment.com/
http://blog.shippable.com/ci/cd-of-microservices-using-mono-...
https://www.godaddy.com/engineering/2018/06/05/cicd-best-pra...
I feel like this discussion is missing an appreciation for size/scope of repositories vs. size/scope of the organisation developing that software, with a pinch of appreciation for Conway's law.
If your team is a typical team of at most, say, 30 people, then maintaining 15 different repositories is clearly insane, but merging them into a single one likely doesn't truly deserve the moniker "monorepo", because it's just not that large (and varied in scope and purpose) of a project at the end of the day.
Think of it this way: the Linux kernel is certainly a larger project, but nobody thinks of it as a monorepo. Same thing goes for major software projects like Qt.
I think that's the big thing that always puts me off monorepo... We'd basically be going from ten 5 minute builds to one 50 minute build if it wasn't possible to do incremental builds. IIRC Google and MS have purpose built tools that do impact detection to work out what to build for their monorepos to keep build times down.
It's definitely important to consider before jumping in. Going from 5m to 50m compile times would be a major issue for me.
We have plans to use Bazel in the future, but you have to boil the ocean when moving to bazel and get everything ever inside bazel before you get any benefit out of it.
Jenkins can't do it "easily" but it definitely can. I'd be happy to share our Jenkinsfile if you'd like.
Our finding of changes is something like:
#!/bin/bash set -euxo pipefail
COMPARE_BRANCH=$1
MERGE_BASE=`git merge-base $COMPARE_BRANCH HEAD` FILES_CHANGED=$(git diff --name-only $MERGE_BASE | grep '/') echo ${FILES_CHANGED} | xargs dirname | cut -d "/" -f 1 | sort | uniq
[1] blog.shippable.com/ci/cd-of-microservices-using-mono-repos
In a way this is one of the hallmarks of a monorepo - Interfaces and dependencies changing so quickly it becomes too troublesome for humans to categorize (and re-categorize) them into repositories, so you let a machine (makefiles) do the work instead. And even without a monorepo you still have the same problem, eventually you will have to integrate all your mini repos into one final product, which you want to have tested. This is something you want to do as frequently as possible, ideally on every commit, not by doing major version-steps of sub-projects.
The major technology organizations we hear about usually have at least several monorepos, due to the legacies of acquisitions and mergers if nothing else.
At the scale of thousands of subprojects, I am not entirely sure the benefits are as advertised. There will be support of subprojects forked to public github.com or gitlab.com if nothing else. And there will be external dependencies to manage; system level libraries like openssl and libc if nothing else. Even if they are vendored in to the monorepo, any upstream regression is a significant problem in a monorepo... and the problem sometimes has to be solved in a big bang instead of incrementally.
At 15, it feels like it's kind of just a toss up. We have several thousand repos, and sometimes we see 5-10 of them that really should be grouped, and we do so. Sometimes we see 1 repo that has 5-10 projects in it, and we break them down. Whatever works.
But when the entire org is on a dozen project you're potentially in the worse of both worlds. Your repos aren't small enough or aligned with team ownership enough to really benefit from it. So its straight overheard.
Last I heard there were plans to move www into fbsource.
There are certainly not random dependencies on public GitHub pages. Everything is versioned.
There is a mind boggling amount of custom tooling to make this work.
"The theory behind the microkernel is that operating systems are complicated. So you try to get some of the complexity out by modularizing it a lot. The tenet of the microkernel approach is that the kernel, which is the core of the core of the core, should do as little as possible. Its main function is to communicate. All the different things that the computer offers are services that are available through the microkernel communications channels. In the microkernel approach, you’re supposed to split up the problem space so much that none of it is complex. I thought this was stupid. Yes, it makes every single piece simple. But the interactions make it far more complex than it would be if many of the services were included in the kernel itself, as they are in Linux. Think of your brain. Every single piece is simple, but the interactions between the pieces make for a highly complex system. It’s the whole-is-bigger-than-the-parts problem. If you take a problem and split it in half and say that the halves are half as complicated, you’re ignoring the fact that you have to add in the complication of communication between the two halves. The theory behind the microkernel was that you split the kernel into fifty independent parts, and each of the parts is a fiftieth of the complexity. But then everybody ignores the fact that the communication among the parts is actually more complicated than the original system was—never mind the fact that the parts are still not trivial. That’s the biggest argument against microkernels. The simplicity you try to reach is a false simplicity."
Also applies to some microservice architectures I've seen. People completely disregard the complexity (and overhead!) of the interactions between microservices.
They're so easy to set up and they immediately solve problems. But they also create many more, which aren't immediately obvious. And very few people are willing to say, "I was totally wrong to move to them, and let's spend some more precious time rolling them back."
RPCs and local method calls both need to be fault tolerant and race condition free. As you break up datastores, transactions become more complex but certainly you had a specific reason to do that so the complexity isn't a choice.
Sure the communication layer is added complexity but that too is should be abstracted into boilerplate such that you shouldn't have to think about it. Overall the added complexity requires more work but it shouldn't really make your business logic problems more complicated.
Firstly, the benefits they offer have often little to do with the architecture itself, but with the bigger picture (separating teams, CIs, allowing different stacks, managing costs, scaling, etc).
Secondly, unlike microkernels, not all microservices have to talk to every single other microservice. If you have a service to send emails, say, there'll be a few services that interact with it, but the majority won't. The same for an image resizing service.
So what you say doesn't necessarily hold
Although, Linux still probably does the best job of being stable compared to rest of OS I use (Windows, macOS). I can't recall the last time I got into kernel panic or crashed (despite worse drivers in some cases).
Seems like a blatant straw man to me.
So I agree with you that Linus is presenting a straw man and your comment shouldn’t have been downvoted.
When you break up a problem the goal is to find clear bottlenecks of complexity such that you can abstract a thing to its inputs and outputs and ignore the complexity within. You reduce the amount of knowledge required from any given perspective, thus reducing peak cognitive load.
Sure the system is as or possibly slightly more complex, but there is a distinct advantage to reducing the peak complexity of any given sub-problem.
I think this is the argument around which the whole post is made. Everyone does want to work in a small space where they control everything. I want to see git log with just my code commits - so I'll make a microservice out of it.
All other arguments are just there to wrap this one. I think it's wrong.
At an organisational level, a monorepo is more good than bad because it simplifies dependency management and makes for a low-ego team.
git log .If engineers or even team-leads don't have permission to create a repo themselves well then you're probably going to see benefits from a monorepo.
At the same time if you have say a ~100 people sharing a repo then you have to make sure that you have tooling that allows each team to customize their building and test environments themselves, which is hard because many CI solutions assume that One Repo = One Build status. Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever. But of course they do, everyone does, and if you have a hundred developers builds will be broken and people's productivity ruined.
Perhaps because we're an industry so prone to failure we keep looking for that one solution, that given a good team makes all problems go away. Agile, XP, Monorepos, Containers, Microservices: we tell ourselves will solve our problems and get those pesky business people off our backs for good. But they won't and never will.
What really matters is enablement, how can we get our code into production doing what it was intended to do without having your toes stepped on all the time. If you design your processes and tooling around enablement not the cargo-cult flavour of the month buzzword invented in the modern beautifully architected, yet completely open office spaces of one of the FAANG companies, then maybe then you can get some actual results.
They don't, ever, because the VCS refuses a push if it breaks the build.
That's the problem with all those single-repo discussions. It works perfectly well if you have all the tooling that makes a single repo work like a multi-repo.
And it's great because you can enforce behind the scenes that everything is coherent... Except that you can enforce the same thing on a multi-repo if you write the equivalent tooling. All the points are completely moot, except the one that if you don't have a ton of tooling, a single-repo won't work at all, while a multi-repo will just be not great.
The calculus is trickier than this.
He thinks the above is true because of this other thing he says:
> With multiple repos, modularity is the norm.
But if this were true, being "Good" would be easy. I wish programming tools were this able. Then I could go to the pool every day!
But just because you're using some feature of a build or programming system -- like modules or classes or namespaces -- doesn't mean you get the win. Certainly doesn't mean you know how to wield these tools.
In the end the technical feature doesn't save you. You actually have to have a hard-earned skill, which is how to properly modularize code into stable components with narrow stable interfaces and all of that. This skill is very rare ime.
Now back to monorepo vs modules.
If you use modules but you suck at modularization you're going to be paying a huge tax. Because you'll be creating volatile code/interfaces and you'll have to go through a process each change. You will be amplifying the tax from your lack of skill that you wouldn't if you were just in a single monorepo.
On the other hand if you use a monorepo and you suck, you won't experience ^^this^^ pain and you'll be at a much higher probability of staying sucking.
In short, programming language and build features don't bestow skills.
If one team has one application that is split across multiple repositories it can be a productivity boost and a simplification to unite them into a single repo with some single tools and norms.
If you have two teams working primarily on two sets of repos and two different systems or applications, by all means split them into two (or more) repos. Just be cause its called a "monorepo" doesn't mean you can't have more than one!
It may be simpler to have one, it may be simpler to have many. Do whats simpler for you! I happen to think that it primarily depends on how your teams are organized more than on who is in the teams or their "badness" levels.
Honestly, it never occurred to me that you're deploy from more than one branch. If you can't merge the branches into <your main branch that releases are built from>, then what's in the branch doesn't make it into a release (from my experience).
For instance, big app rewrite, half-new REST API on the backend. Oh, but we need to maintain the old app APIs for those who can't update (like SuperImportantCustomer). Better fork!
Release branches! Deploy 1.1 from the 1.x branch on the same day you deploy 2.2 from the 2.x branch. 1.x merges into 2.x which merged into master.
The drawback is that not many people are familiar with submodules and they can be a bit tedious to set up, though working with a submodule is almost like working with a normal file in git. One danger is of course that branching between individual submodules can get messy. Another nuisance might be that you have to commit recursively, i.e. if you have one repository with a submodule to which you make changes you need to first commit these changes in the submodule and then create a new commit in the parent repository that adds the new version of the submodule. Maybe this is a good thing though as it forces you to commit changes individually in each submodule before committing a larger change into your main repository. In general I would avoid nesting submodules more than two levels deep, as this can quickly get confusing.
In the past I've also worked on a large mono-repositoriy and enjoyed it as well, just curious to hear if anybody has used submodules in a larger team.
As you say, getting a change first into the child and then into the parent requires double PRs and testing. But that's the same as if you had it as a package dependency. Only that instead of a version, you have a sha1 which you never know what it is.
I prefer package dependency, because it forces you to explicitly make a release, where it should have passed PR and some integration tests. Also merge conflicts are clearer.
> With multiple repos, modularity is the norm. It's not a must - you technically can have a repo depending on umpteen other repos. But your teammates expect to be able to work with their repo with a minimal set of dependencies.
You'd think so.. but no. I am working with multi-repo project where some repos have about dozens of dependencies, all developed locally, and interdependent on each other. Bumping the basic repo is very hard and frustrating. I miss my monorepo every day, where I could just make a PR and fix all consumers at once, where I had a CI which would test all modules at once.
All monorepo projects I've ever worked on enforced the same either through the language involved or mechanically, and it was universally a good thing.
I have heard plenty of people complain about their many-repo structure and wishing for a monorepo. I would like to hear some concrete story where a monorepo went wrong. This article is just abstract opinion.
Usually the PM or PO force everyones to using some vague "product version" they track for public releases. Places that use monorepos well tend to have very few branches (like, maybe 2 or 3), which have nothing to do with your public product versioning scheme, and instead use "branch by abstraction" to stay integrated.
But what I end up seeing, is that the product team and middle management gets involved with dictating version control, e.g., "this will be version 1.2, and then that should be 2.3, ok let's cut those branches...", and then they change their minds as some team has to delay and before another is ready. And then bugs start rolling in from both testing and they don't know what to do, and they start asking people to just "get it done", and then, things really start falling apart. You add 10+ teams trying to use branches for their own work based off of god knows what and it becomes a mess of crazy integration problems.
I seriously think that a huge benefit of multiple repositories, is that is scares the pseudo-technical managers and product people into not bothering with trying to track or dictate usage of the version control system.
For example I can be working on project B and need to make a change to Lib A, so I make the change commit my work and now project Z broke. Now I have to learn whatever the hell project Z is because it's not my responsibility and we may not even have anyone responsible for it. Then I have to work out of the changes to lib A need to be reverted, backward compatible or if project Z needs to be updated. This sort of thing with 10 libraries and 40 apps and the complexity that every individual developer has to deal with goes off the charts.
Separate repos with versioned packages don't necessarily fix this but they do let you manage it a lot better, whoever is working on project Z can update it's version of lib A at an appropriate time (or never).
Postponing the required Z change to later could be seen as beneficial in some scenarios but what if the change you made to lib A was a security fix, then you would want all apps of that lib to be forced updated right away. Then your your change should be backwards compatible, monorepo or not.
If you want to have reusable components then make sure they are reusable, if you want a special version of lib A that only works with lib B you are essentially forking lib A making it not longer a reusable lib, just a subdir for project B. Interface versioning could help with such non backwards compatible changes, in a monorepo you normally do this with a /2.0-directory.
This wasn't an insurmountable problem. Something like Bazel probably would've done wonders here rather than our homegrown incremental testing logic (as well as nailing down why incremental builds weren't happening). Personally that's where I would've invested time rather than splitting up the repo. I'm not sure what ended up happening. I moved onto another project before seeing the conclusion of that conversation.
FWIW, a younger me led a break-up of another monorepo into a multi-repo that I now regret and think caused more pain than it was worth (likely because I split the repo along the wrong lines). And so I disagree with the premise of the article. If you split your repos incorrectly, you can cause more pain than not splitting your repos at all. Long build times are annoying and a velocity-killer. Moreover in the long run you can get ball-of-mud problems that repo boundaries make harder (that was probably the biggest impetus for why I wanted to break up the repo in the first place). However, incorrect version linking due to miscoordination of fast-moving dependencies in different repositories is a production-services-killer, and that caused us no end of frustrations. This was in addition to the annoyances around the fact that we had several different JVM languages that all had different build systems in each repo meaning that cross-repo edits were even more difficult than usual to corral together locally on a developer's machine since the build artifacts all depended on each other, but this was expressed in different ways in different repos.
Just as a bad abstraction is worse than no abstraction, I believe bad modularity is worse than no modularity.
Note that tooling helps this as well; tooling that exposes the transitive dependency chain of production services can reveal inconsistencies in what you thought was the version of a dependency that was deployed and what was ultimately deployed. But that means that both multirepos and monorepos need tooling.
As for the philosophical issue that yosefk raises, I generally advocate solutions that work for the case that the team is good. I tend to think that if the team is not good you would be cooked anyway. Also, if you raise the stakes people might actually start learning a bit.
These sorts of Black and White, naked assertions drive me nuts. Buried in this statement is an assumption that the only software model worth even discussing is SaaS software - all copies of the code being run are run by your team, so master@HEAD is our ground truth at all times (except during deployments, which BTW are happening at least 10 minutes out of every hour of every day...)
Teams that sell applications or allow self-hosting, or even some SaaS shops with large enough customers are going to have to maintain multiple release branches. Possibly for years. From personal experience, anything above 3 seems to become unsustainable. But having 3 repos (a monorepo with 2 active branches + master) may be the right answer for you. One can't work, and 100 is murder. Stop the pendulum in the middle.
The most obvious issue with this post is it fails to acknowledge that for any production scale company, you can't blindly say that a decision like this is a choice of personal philosophy (unless you're starting a brand new project from scratch, in which case, spending tons of time structuring repos well probably isn't your first priority since new projects have very little code).
I'd love to see more articles that discuss repo structure in the context of a pre-existing codebase with hundreds of thousands of lines of production code and 10+ engineers collaborating on it.
For anyone reading: I'd be interested to hear anecdotes from people working at companies that have successfully (or unsuccessfully) re-structured a monorepo, the reasons you did it, how much time was invested in the restructure, and whether you think it was a net positive long-term.
A 'simple' solution may not be easy to grok. Few people think at the level of axioms.
A survivable solution must be passed along to many people across generations.
For some, monorepos are simple because that is what they know, for others multirepos are the norm. The survival of the firms that adopt these strategies will somewhat dictate what repo strategy propagates in the world, not the best design.
The real argument that the author is making is that worse is better. Monorepo requires good tooling and a disciplined team. Multi-repo is worse but it is easier to manage when dealing with inexperienced programmers who want to have their own repo for their shiny little microservice.
IMO, the difference is that multi-repo has limitations and mono-repo has challenges. You will never get atomic commits and precise versioning in multi-repo. With mono-repo, there are a lot of challenges that can be solved with good engineering.
In the real world you ask your doctor if it's okay and he says "sure" because he's read a few things about it and it's what you seem to want. But he doesn't really know, because no one is capable of understanding the human body in its full complexity.
So you just end up taking Monorepo and hoping it doesn't make you severely depressed or give you seizures that send you to the emergency room.
Side effects include but are not limited to your repo growing into a single giant ball of circular dependencies.
Should glibc, gcc and the kernel be in a monorepo?
(Cue laugh track ...)
I dislike having all my code under one 'src' directory and it's nice to have modules like foo-ui and foo-util and so forth. Knowing that one module doesn't use ui components is nice because you can use it on the backend for example.
But it's all fully integrated and tested together.
In that context, if a thing is tough but has value, you make a path to it. First make the tools consistent, and then make people consistently use the tools. The more predictable the system becomes (predictability is the opposite of magic!), the more you insist on people using it. Pushback is a kind of feedback, and you have to address at least some of the concerns of people who refuse ('meet me halfway here').
Someday it will shock no-one to say that Git is not the best of all possible version control tools. If this is difficult, it may not be the people. Maybe it's time to start thinking about the next version control system?
SVN had some pretty decent facilities for monorepos. Some people will tell you that Git traded some of these features for others, but looking through the information architecture documentation for git, I don't think I can agree. Some of that information is there, it's just maybe not packaged for consumption.
How is this an argument? You merge the unforked projects and its the equivalent to multirepo. I don't follow the argument at all.
...I started writing rebuttals to the others but I guess when your argument is "yeah you can do it right but you _could_ do it wrong and I have defined the question in such a way that we err on success for multi-repo and err on failure on mono-repo" I can't really fight that.
If branch A on repo X will only work with branch B on repo Y, you're holding that relationship in an uncoded way. It's true and unrepresented, and you never want that.
1. Monorepo: Google3
2. Non monorepo, large pile of #@$: Microsoft Exchange
3. Non monorepo, Amazon
After working with them, from my personal experience, monorepo was the best. Yes, Google has ton of internal stuff and they could go away with not using much of external dependencies, but when everything works, it works like a charm. Convenience of defining protobufs/contracts, ease of reference them and ton of the things are given to you when you're in the system.
At Google I never felt that the system is hostile to you. It was extremely easy to start hacking something if you'd like to. Yes, it's not only monorepo, but the overall quality of the tools available, but monorepo is also quite a significant part of it.
> ...
> In a not-so-good team, your monorepo will grow into a single giant ball of circular dependencies.
Sounds like the author was lucky enough to not encounter a not-so-good team with multiple repos.... :D
This way, you'll have to deal with
* difficult tooling
* dependency hell between the mono repos (which are now way more tightly coupled due to the dependency graph between them being denser)
* long living branches causing way more collateral damage as described in the original article
* cross-repo changes have become even harder for all the reasons above
You get all the bad things and avoid those advantages! Welcome to my world :)
I don’t quite get the point of this article though. I am a very enthusiastic but not rockstar programmer, and I don’t have problems with a large mono repo.
Yossi's comes off extremely pretentious, without really explaining why people consider multi-repo projects to begin with.
Hahaha...
Stop it and focus on one single web app instead that anyone can run. Unless your app is going to be in the top 10 that people can't miss downloading.
Branching: getting forked by your worst programmer This example seems contrived I've never worked anywhere where having a fork that works for some scenarios and not others is tolerated for long. This would be given the highest priority.
Modularity: demoted from a norm to an ideal This is basically saying a multi-repo makes it hard to reuse your own code which increases modularity. I find devs on very large projects are already reticent to reuse code from other projects/teams, but this pushes them even farther to rewrite domain logic that should probably be shared. In most cases I'd trade a little bit of modularity for increased domain logic consistency.
Tooling: is yours better than the standard? There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large. For these I guess it makes sense to break it up into manageable peaces based on the relationships between your projects.
I've worked on subpar teams that decided against monorepo and it was a nightmare. It took forever to get setup, the build time was days, and cross repo edits were painful. They regretted going multi-repo.
With multiple repos it gets even crazier. You have forks for A and B that work with some other software repo, maybe common, maybe forked itself. Soon you have wiki pages with compatibility matrices, common libraries that mysteriously break with minor changes despite being battle tested ... for some set of versions.
If software is big enough to force considering monorepos, then cross-cutting dependencies will happen, and then one diffable long-lived monobranch is much better than a bunch of interleaved ones.
Incrementally building, landing, etc., cross-cutting deps becomes much less of a slog. Ex: skip concerns about versioning of _internal_ APIs.
The other issue I had was the sleight-of-hand on build modules. Yes, you don't need good devs to speed up incremental builds b/c you can search/build in each project. But if you want to run 20 modules together to test/experience them, good luck, esp. for interactive modes. (Congrats, you reinvented the monorepo!)
We are moving to a mono-repo for this very reason. Debugging something where you had to make changes to multiple projects, wait for those changes to go through the build pipeline, pull in updated packages in projects that depended on them, then change those projects (and on), was a complete nightmare.
We're a bit over halfway there (the most painful half) and I expect us to get the rest done sometime in the next 3 months or so. Not looking back. At all.
Facebook's Mononoke (https://github.com/facebookexperimental/mononoke) pretty much removes that argument. They outgrew their current source control, and that's their path forward for the next couple orders of magnitude.
> The version that we provide on GitHub does not build yet.
So, maybe eventually. They found that Git didn't do well at their scale, so they modified mercurial instead.
I'm all in favor of a company-wide mono-repo if it doesn't have scaling issues.
In my experience that is not the entire trade-off. At scale, you also get circular dependencies between modules, which makes refactoring, migrations, deprecations, and other improvements incrementally impossible. Sometimes this can happen unintentionally through including a "upcall" to a module that is actually the best tool for the local job at hand.
In the case of several repos, you will notice the extra work needed to pull in the extra project. In the case of a monorepo... it might look like any benign change.
I think one of the points of the article is that this isn't (sufficiently) true; people will use tools incorrectly and make excuses for deviating from the strategy often enough that it's a problem.
> Ugh I don't like his use of the word dumbass
Substitute "well-meaning person who makes a totally understandable process mistake"