Monorepo is great if you're really good (opens in new tab)

(yosefk.com)

225 pointsmr_crankypants6y ago151 comments

151 comments

114 comments · 40 top-level

cbanek6y ago· 13 in thread

> the entire code base got forked, and the entire org is now held hostage by the dumbass.

> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

Really, the one consistent thing is that if you have a good team, you'll make it work no matter what tech or decisions you make (assuming you're also good enough to know when you've lost and change course), and if you're a bad team, you're doomed to failure, because, well, you're bad (by definition).

I think this article also vastly underestimates the cost and annoyance of the tooling of CI'ing a large number of repos, especially if you have to match or do some kind of cross product on the feature branches. (such as, repo A branch B can only be built with repo C branch F, but all the other repos should be master)

jerf6y ago

"Really, the one consistent thing is that if you have a good team, you'll make it work no matter what tech or decisions you make (assuming you're also good enough to know when you've lost and change course), and if you're a bad team, you're doomed to failure, because, well, you're bad (by definition)."

The middle ground is vast, and nuanced in many dimensions. Which is a good thing, because there sure aren't very many large, good teams (by this definition of "good").

cbanek6y ago

Totally agree. Also many teams aren't all "good" or "bad" people, but more a mix of people with different skillsets and viewpoints, and it's not as if all good people think you should do it one way, and bad people the other.

I think the normal case though is that there are too many cooks, and they spoil the broth. I've had teams where one person wants to go off and make their own repo just because, and can't be convinced to follow the rest of the team. Sometimes these are good people, although I find a lot of "bad" people don't want to be team players and be consistent, even if being consistent means doing something they don't like.

mrfredward6y ago

> repo A branch B can only be built with repo C branch F, but all the other repos should be master

This alone is pretty much what makes me prefer monorepos. If you don't have a stable interface for all of your in-house dependencies (and nobody does early on in a project), you're doomed to spend a ton of time matching branches like this. Not to mention, a naive build process of "grab the latest everything and build it" will break in that period of time when you've merged the feature branch in one repository but not the other.

0xEFF6y ago

All modern dependency management tools solve this problem by allowing dependencies to be specified as URI's.

1 more reply

thaumasiotes6y ago

>> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

Heh. This reminded me of a different story, which I remember vaguely enough that I'll paraphrase from memory:

> "The Excel team will never go for it. Their motto is 'Find the dependencies... and eliminate them.'"

> This probably explained why the Excel team had its own C compiler.

hinkley6y ago

Like a lot of things in a Good Team, dependency curation requires a tempo. This will be done, and putting it off doesn't get you out of doing it.

mixmastamyk6y ago

Sounds like Joel on Software.

1 more reply

GauntletWizard6y ago

Tooling CI into a monorepo can be nasty, too. Do I update my staging deployment for every job on every commit? You can slow down deployments pretty fast, too, and make integration a real pain.

humanrebar6y ago

Exactly. You have the same problem in a monorepo since the problem is validating a change actually works without breaking an unknown number of other things indirectly.

There is a little bit of a novel problem in correlating N feature branches and cloning them, but it's not that much more complicated than correlating N subprojects in a monorepo.

pklausler6y ago

Observation: the quality of the work done by a team is too often a MIN or PRODUCT over the members of the team, not a SUM.

hinkley6y ago

One of the concepts being pushed by some in the Lean community is the idea that some activities need to be bounded at all times.

You don't want people favoring starting new work while someone else is flailing on old work. The fact that they started this two weeks ago indicates it was probably higher priority than whatever you might start today. If today's work is an emergency (eg, if we could go back in time, we'd have started this immediately) then sure. But barring extenuating circumstances, go help Paul. He's been staring at that code for weeks and making no progress.

I think having a rule for how many branches/forks whatever you want to call them, can exist at once might be a good idea. Every time another opportunity to use a branch comes up, the older reasons have to defend their continued existence. Having to explain yourself over and over is a form of positive peer pressure, if potentially a little passive-aggressive (solution: use an assertive person to be the messenger).

humbleMouse6y ago

Umm, appoint someone or a trusted few as git admins and only allow them to merge commits?

(responding to the article contents quoted in above comment)

grenoire6y ago

I imagine this would add a lot overhead when the git admins are not well-versed in all subprojects of the monorepo.

jasim6y ago· 13 in thread

Linus Torvalds said something about this, in relation to microkernels. But the gist is in how interactions between many pieces makes the whole thing complex. Here's the quote from his book "Just for Fun".

"The theory behind the microkernel is that operating systems are complicated. So you try to get some of the complexity out by modularizing it a lot. The tenet of the microkernel approach is that the kernel, which is the core of the core of the core, should do as little as possible. Its main function is to communicate. All the different things that the computer offers are services that are available through the microkernel communications channels. In the microkernel approach, you’re supposed to split up the problem space so much that none of it is complex. I thought this was stupid. Yes, it makes every single piece simple. But the interactions make it far more complex than it would be if many of the services were included in the kernel itself, as they are in Linux. Think of your brain. Every single piece is simple, but the interactions between the pieces make for a highly complex system. It’s the whole-is-bigger-than-the-parts problem. If you take a problem and split it in half and say that the halves are half as complicated, you’re ignoring the fact that you have to add in the complication of communication between the two halves. The theory behind the microkernel was that you split the kernel into fifty independent parts, and each of the parts is a fiftieth of the complexity. But then everybody ignores the fact that the communication among the parts is actually more complicated than the original system was—never mind the fact that the parts are still not trivial. That’s the biggest argument against microkernels. The simplicity you try to reach is a false simplicity."

bartread6y ago

> "...The simplicity you try to reach is a false simplicity."

Also applies to some microservice architectures I've seen. People completely disregard the complexity (and overhead!) of the interactions between microservices.

asdfman1236y ago

I feel like I've become a crusader against microservices for the same reason.

They're so easy to set up and they immediately solve problems. But they also create many more, which aren't immediately obvious. And very few people are willing to say, "I was totally wrong to move to them, and let's spend some more precious time rolling them back."

1 more reply

jayd166y ago

This is a poor take away. Even in a monolith, you still want to separate your concerns, yes? Your code should be almost as abstracted in a monolith as they are in micro services. In microservices the code is just deployed across multiple instances.

RPCs and local method calls both need to be fault tolerant and race condition free. As you break up datastores, transactions become more complex but certainly you had a specific reason to do that so the complexity isn't a choice.

Sure the communication layer is added complexity but that too is should be abstracted into boilerplate such that you shouldn't have to think about it. Overall the added complexity requires more work but it shouldn't really make your business logic problems more complicated.

1 more reply

gotofritz6y ago

Micro services are a different kettle of fish though

Firstly, the benefits they offer have often little to do with the architecture itself, but with the bigger picture (separating teams, CIs, allowing different stacks, managing costs, scaling, etc).

Secondly, unlike microkernels, not all microservices have to talk to every single other microservice. If you have a service to send emails, say, there'll be a few services that interact with it, but the majority won't. The same for an image resizing service.

So what you say doesn't necessarily hold

vinay_ys6y ago

Does anyone here do Microservices well? And keep them in a monorepo?

1 more reply

justinmchase6y ago

Exactly, they each have a maintenance cost which isn't shared when they're all separated.

raz32dust6y ago

It boils down to gut instinct again, since there is no clear line of how much complexity to accrue before splitting something up. I think he likes to keep Linux in one repo because that makes it easier for him to watch the project and manage it. Currently, if linux gets separated into 10 components, Linus will still have to keep an eye on all 10, so from his point of view, complexity is still the same, or worse. But if he could actually let go and let someone else completely own another component, this would not be the case. Bottom line is that he is smart enough to be able to do manage a repo that large, which only proves the author's point.

truncate6y ago

If I'm not mistaken, another argument for microkernel was the isolation of each modules. For example, if I'm using some driver X and it crashes, rest of my system will continue to work fine. That's not the case with monolithic kernel. I think this safety is pretty cool to have.

Although, Linux still probably does the best job of being stable compared to rest of OS I use (Windows, macOS). I can't recall the last time I got into kernel panic or crashed (despite worse drivers in some cases).

detaro6y ago

I've been positively suprised by Windows in the presence of some issues with bad graphics drivers. On Windows, the screen flickers, the "guilty" app dies and a popup appears in the corner "sorry, the graphics subsystem had to be restarted". Whereas GPU driver issues on Linux typically leave you at the text console at best. (a driver going totally haywire can of course bring down both entirely)

1 more reply

SomeOldThrow6y ago

> The theory behind the microkernel is that operating systems are complicated.

Seems like a blatant straw man to me.

sorbits6y ago

The advantage of microkernels is that they can be extended with “untrusted” code like hardware drivers or file systems. This runs in user space and thus any bugs in such code will not crash the kernel process.

So I agree with you that Linus is presenting a straw man and your comment shouldn’t have been downvoted.

3 more replies

jayd166y ago

Its only a false simplicity if you still need to track the interaction between everything and everything else.

When you break up a problem the goal is to find clear bottlenecks of complexity such that you can abstract a thing to its inputs and outputs and ignore the complexity within. You reduce the amount of knowledge required from any given perspective, thus reducing peak cognitive load.

Sure the system is as or possibly slightly more complex, but there is a distinct advantage to reducing the peak complexity of any given sub-problem.

ascertain6y ago

This is analogous to the debate about when to break long functions into shorter ones. The simplicity argument usually doesn't consider the increased complexity of all the interactions between the new, shorter functions. If you omit that, you get a misleading picture, or as Linus puts it, a false simplicity.

DanFeldman6y ago· 8 in thread

My team just switched to a monorepo. It's been only a few weeks, so I can't claim any results yet, but we've lived w/ the pain of poly-repo for long enough that we were ready to invest in a single repo.

We've spent a lot of time building and iterating a unified ci/cd environment to support the new repo. Previously each project had it's own test/deploy/build/publish story and usually it's own jenkins project. Now, each project is registered and triggers its own steps. Cross-project edits can happen in a single pull request. We have an incredible amount of integration tests (more so than unit tests), and getting them to work corss-project while migrating has been challenging.

We've gone from ~10-15 actively maintained repos to about 3 as we're slowly migrating. We have a mix of services, libraries, and batch processing all mixed in.

The authors points about forking and long-lived branching being incredibly difficult for most teams is really crucial. We're going to have to invest in education for new members about WHY we have a monorepo, what it means for your development, and how to change your perspective for developing at HEAD. I don't think 'bad' developers make it easier or harder. Instead, clearly articulating behaviors that exist in a poly-repo vs mono-repo world to developers is the Differentiator.

These articles were absolutely crucial to developing our monorepo.

https://trunkbaseddevelopment.com/

http://blog.shippable.com/ci/cd-of-microservices-using-mono-...

https://www.godaddy.com/engineering/2018/06/05/cicd-best-pra...

atq21196y ago

> My team just switched to a monorepo.

I feel like this discussion is missing an appreciation for size/scope of repositories vs. size/scope of the organisation developing that software, with a pinch of appreciation for Conway's law.

If your team is a typical team of at most, say, 30 people, then maintaining 15 different repositories is clearly insane, but merging them into a single one likely doesn't truly deserve the moniker "monorepo", because it's just not that large (and varied in scope and purpose) of a project at the end of the day.

Think of it this way: the Linux kernel is certainly a larger project, but nobody thinks of it as a monorepo. Same thing goes for major software projects like Qt.

joshschreuder6y ago

How do you handle building changes to just one of those projects? Can Jenkins do that (easily)?

I think that's the big thing that always puts me off monorepo... We'd basically be going from ten 5 minute builds to one 50 minute build if it wasn't possible to do incremental builds. IIRC Google and MS have purpose built tools that do impact detection to work out what to build for their monorepos to keep build times down.

kevan6y ago

If you're doing a monorepo I think it's strongly implied that you'll also use a build system (Blaze/Brazil/BuildXL etc) that has granular compilation units and output caching so build time doesn't scale linearly with the company's total codebase.

It's definitely important to consider before jumping in. Going from 5m to 50m compile times would be a major issue for me.

2 more replies

DanFeldman6y ago

It was a bit hacky, but we've basically implemented some of the stuff in [1] to achieve incremental builds. If a pr changes projects a,b,c and not x,y,z then it will only build a,b,c. But it's not truly incremental right now, as it won't test things that depend on A/B/C.

We have plans to use Bazel in the future, but you have to boil the ocean when moving to bazel and get everything ever inside bazel before you get any benefit out of it.

Jenkins can't do it "easily" but it definitely can. I'd be happy to share our Jenkinsfile if you'd like.

Our finding of changes is something like:

#!/bin/bash set -euxo pipefail

COMPARE_BRANCH=$1

MERGE_BASE=`git merge-base $COMPARE_BRANCH HEAD` FILES_CHANGED=$(git diff --name-only $MERGE_BASE | grep '/') echo ${FILES_CHANGED} | xargs dirname | cut -d "/" -f 1 | sort | uniq

[1] blog.shippable.com/ci/cd-of-microservices-using-mono-repos

Too6y ago

If you have to do make clean in a monorepo you are pretty much toast. Tooling for impact detection and reliable makefiles that always succeed incremental builds is absolutely crucial.

In a way this is one of the hallmarks of a monorepo - Interfaces and dependencies changing so quickly it becomes too troublesome for humans to categorize (and re-categorize) them into repositories, so you let a machine (makefiles) do the work instead. And even without a monorepo you still have the same problem, eventually you will have to integrate all your mini repos into one final product, which you want to have tested. This is something you want to do as frequently as possible, ideally on every commit, not by doing major version-steps of sub-projects.

humanrebar6y ago

I suspect for that number of projects monorepos make a lot of sense.

The major technology organizations we hear about usually have at least several monorepos, due to the legacies of acquisitions and mergers if nothing else.

At the scale of thousands of subprojects, I am not entirely sure the benefits are as advertised. There will be support of subprojects forked to public github.com or gitlab.com if nothing else. And there will be external dependencies to manage; system level libraries like openssl and libc if nothing else. Even if they are vendored in to the monorepo, any upstream regression is a significant problem in a monorepo... and the problem sometimes has to be solved in a big bang instead of incrementally.

shados6y ago

Going from 15 to 3 is definitely a different discussion from going from 7000 to < 5.

At 15, it feels like it's kind of just a toss up. We have several thousand repos, and sometimes we see 5-10 of them that really should be grouped, and we do so. Sometimes we see 1 repo that has 5-10 projects in it, and we break them down. Whatever works.

But when the entire org is on a dozen project you're potentially in the worse of both worlds. Your repos aren't small enough or aligned with team ownership enough to really benefit from it. So its straight overheard.

umanwizard6y ago

FWIW at least 95% (anecdotally) of Facebook’s main code is in two gigantic monorepos: fbsource and www. (The other major repos are for configuration-related stuff).

Last I heard there were plans to move www into fbsource.

There are certainly not random dependencies on public GitHub pages. Everything is versioned.

There is a mind boggling amount of custom tooling to make this work.

1 more reply

qznc6y ago· 8 in thread

Does anybody know some monorepo horror stories?

I have heard plenty of people complain about their many-repo structure and wishing for a monorepo. I would like to hear some concrete story where a monorepo went wrong. This article is just abstract opinion.

mr_tristan6y ago

Oh yes. #1 problem I've experienced at multiple places: delaying integration with tons of branching.

Usually the PM or PO force everyones to using some vague "product version" they track for public releases. Places that use monorepos well tend to have very few branches (like, maybe 2 or 3), which have nothing to do with your public product versioning scheme, and instead use "branch by abstraction" to stay integrated.

But what I end up seeing, is that the product team and middle management gets involved with dictating version control, e.g., "this will be version 1.2, and then that should be 2.3, ok let's cut those branches...", and then they change their minds as some team has to delay and before another is ready. And then bugs start rolling in from both testing and they don't know what to do, and they start asking people to just "get it done", and then, things really start falling apart. You add 10+ teams trying to use branches for their own work based off of god knows what and it becomes a mess of crazy integration problems.

I seriously think that a huge benefit of multiple repositories, is that is scares the pseudo-technical managers and product people into not bothering with trying to track or dictate usage of the version control system.

beirut_bootleg6y ago

My experience is similar. The newly promoted CTO used a botched merge and delayed release to justify the move to a complex but half-assed monorepo branching strategy with minimal tooling and documentation, trashing any systems teams had in place. We've started botching releases, losing prospective clients, and now are in release limbo.

the_gipsy6y ago

Or you could just tell them to stay off versioning/release/branching discussions, even in a monorepo. All they need to know is what release/deploy has what features, and when the release happens.

1 more reply

flukus6y ago

Not horror stories, but I've used mono repos in a company with 50+ projects and the results tend to be tight coupling, macro level spaghetti and libraries being left to atrophy due to fear of change.

For example I can be working on project B and need to make a change to Lib A, so I make the change commit my work and now project Z broke. Now I have to learn whatever the hell project Z is because it's not my responsibility and we may not even have anyone responsible for it. Then I have to work out of the changes to lib A need to be reverted, backward compatible or if project Z needs to be updated. This sort of thing with 10 libraries and 40 apps and the complexity that every individual developer has to deal with goes off the charts.

Separate repos with versioned packages don't necessarily fix this but they do let you manage it a lot better, whoever is working on project Z can update it's version of lib A at an appropriate time (or never).

Too6y ago

Sounds like a feature and not a bug. It prevents irresponsible breaking changes of lib A interface and just hoping that some other Z team will clean up after you.

Postponing the required Z change to later could be seen as beneficial in some scenarios but what if the change you made to lib A was a security fix, then you would want all apps of that lib to be forced updated right away. Then your your change should be backwards compatible, monorepo or not.

If you want to have reusable components then make sure they are reusable, if you want a special version of lib A that only works with lib B you are essentially forking lib A making it not longer a reusable lib, just a subdir for project B. Interface versioning could help with such non backwards compatible changes, in a monorepo you normally do this with a /2.0-directory.

1 more reply

dwohnitmok6y ago

Not sure if it's a horror story, but I was on a team considering breaking up a monorepo mainly due to build times. Basically our CI would build from scratch a lot more often than we'd like rather than incrementally. It'd also run the full suite of tests more often than we would like. This made builds often take something like half an hour, which if there were multiple PRs open could force a wait of an hour or more before an approved PR could be merged (since CI would need to re-build and re-test a PR if another one was merged before it). This was exacerbated by a version of the branching problem mentioned in a sibling comment.

This wasn't an insurmountable problem. Something like Bazel probably would've done wonders here rather than our homegrown incremental testing logic (as well as nailing down why incremental builds weren't happening). Personally that's where I would've invested time rather than splitting up the repo. I'm not sure what ended up happening. I moved onto another project before seeing the conclusion of that conversation.

FWIW, a younger me led a break-up of another monorepo into a multi-repo that I now regret and think caused more pain than it was worth (likely because I split the repo along the wrong lines). And so I disagree with the premise of the article. If you split your repos incorrectly, you can cause more pain than not splitting your repos at all. Long build times are annoying and a velocity-killer. Moreover in the long run you can get ball-of-mud problems that repo boundaries make harder (that was probably the biggest impetus for why I wanted to break up the repo in the first place). However, incorrect version linking due to miscoordination of fast-moving dependencies in different repositories is a production-services-killer, and that caused us no end of frustrations. This was in addition to the annoyances around the fact that we had several different JVM languages that all had different build systems in each repo meaning that cross-repo edits were even more difficult than usual to corral together locally on a developer's machine since the build artifacts all depended on each other, but this was expressed in different ways in different repos.

Just as a bad abstraction is worse than no abstraction, I believe bad modularity is worse than no modularity.

Note that tooling helps this as well; tooling that exposes the transitive dependency chain of production services can reveal inconsistencies in what you thought was the version of a dependency that was deployed and what was ultimately deployed. But that means that both multirepos and monorepos need tooling.

aidenn06y ago

I asked a similar question a previous time the mono- vs many- repo question came out, and the few responses I got were roughly "The repo became many tens of gigabytes which was unwieldy"

andrewprock6y ago

Unfortunately, breaking a 10 gig repo into 5 repos often means you have to now download five 2 gig repos.

1 more reply

JamesBarney6y ago· 7 in thread

Ugh I don't like his use of the word dumbass. Coding is hard, it's easy to break things. We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

Branching: getting forked by your worst programmer This example seems contrived I've never worked anywhere where having a fork that works for some scenarios and not others is tolerated for long. This would be given the highest priority.

Modularity: demoted from a norm to an ideal This is basically saying a multi-repo makes it hard to reuse your own code which increases modularity. I find devs on very large projects are already reticent to reuse code from other projects/teams, but this pushes them even farther to rewrite domain logic that should probably be shared. In most cases I'd trade a little bit of modularity for increased domain logic consistency.

Tooling: is yours better than the standard? There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large. For these I guess it makes sense to break it up into manageable peaces based on the relationships between your projects.

I've worked on subpar teams that decided against monorepo and it was a nightmare. It took forever to get setup, the build time was days, and cross repo edits were painful. They regretted going multi-repo.

CannisterFlux6y ago

The branching example is very tempting when you have a product that is used on site by multiple customers. Customer A asks for some change that impacts customer B. Without good tests the easiest thing to do is fork the world for A and B, so the changes only impact one. Then you gradually drift apart. Add customer C, D, E... and it gets really fun.

With multiple repos it gets even crazier. You have forks for A and B that work with some other software repo, maybe common, maybe forked itself. Soon you have wiki pages with compatibility matrices, common libraries that mysteriously break with minor changes despite being battle tested ... for some set of versions.

lmeyerov6y ago

Re:Branching, having lived this, monorepos are very much a blessing, not a curse.

If software is big enough to force considering monorepos, then cross-cutting dependencies will happen, and then one diffable long-lived monobranch is much better than a bunch of interleaved ones.

Incrementally building, landing, etc., cross-cutting deps becomes much less of a slog. Ex: skip concerns about versioning of _internal_ APIs.

The other issue I had was the sleight-of-hand on build modules. Yes, you don't need good devs to speed up incremental builds b/c you can search/build in each project. But if you want to run 20 modules together to test/experience them, good luck, esp. for interactive modes. (Congrats, you reinvented the monorepo!)

bartread6y ago

> cross repo edits were painful.

We are moving to a mono-repo for this very reason. Debugging something where you had to make changes to multiple projects, wait for those changes to go through the build pipeline, pull in updated packages in projects that depended on them, then change those projects (and on), was a complete nightmare.

We're a bit over halfway there (the most painful half) and I expect us to get the rest done sometime in the next 3 months or so. Not looking back. At all.

madhadron6y ago

> There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large.

Facebook's Mononoke (https://github.com/facebookexperimental/mononoke) pretty much removes that argument. They outgrew their current source control, and that's their path forward for the next couple orders of magnitude.

tiglionabbit6y ago

Not sure what you're implying about mononoke but the readme says:

> The version that we provide on GitHub does not build yet.

So, maybe eventually. They found that Git didn't do well at their scale, so they modified mercurial instead.

I'm all in favor of a company-wide mono-repo if it doesn't have scaling issues.

1 more reply

humanrebar6y ago

> In most cases I'd trade a little bit of modularity for increased domain logic consistency

In my experience that is not the entire trade-off. At scale, you also get circular dependencies between modules, which makes refactoring, migrations, deprecations, and other improvements incrementally impossible. Sometimes this can happen unintentionally through including a "upcall" to a module that is actually the best tool for the local job at hand.

In the case of several repos, you will notice the extra work needed to pull in the extra project. In the case of a monorepo... it might look like any benign change.

hashkb6y ago

> We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

I think one of the points of the article is that this isn't (sufficiently) true; people will use tools incorrectly and make excuses for deviating from the strategy often enough that it's a problem.

> Ugh I don't like his use of the word dumbass

Substitute "well-meaning person who makes a totally understandable process mistake"

1 more reply

jnurmine6y ago· 4 in thread

Is the monorepo/multirepo choice really the most important thing to consider?

Branching: monorepo or not, if a feature-incomplete development branch for one of the supported targets can "hold the entire organization as a hostage" then the SCM people, and/or persons responsible of the SCM policy, should do some introspection...

Why are deliveries done from a branch which is obviously still in development? Why does code-to-be-released need to depend on incomplete work? Why aren't something like "topic branches" used?

Modularity: monorepo or not, problems will certainly appear when the complexity of implementation outpaces the capacity created by the design. To get modularity, one needs actual modules with properly designed (=not brittle, DRY, KISS, YAGNI, SOLID, etc. etc.) interfaces between the modules. Now, does monorepo/multirepo really play a role here at all? If everyday changes are constantly modifying the module interfaces in incompatible ways which breaks existing code, this speaks something about the design, or rather the insufficiency of it.

Of course, every project and team is different. However, even if a locally optimal choice for the monorepo vs. multirepo question is found, problems existing regardless of monorepo/multirepo will still be there.

jknoepfler6y ago

I was wondering the same thing. Like, I use a microservice architecture at work, but the choice of using one vs. many git repos to represent diffs in those services over time seems largely meaningless. I don't like large diffs or people breaking production, but this is solved by testing and insisting upon small diffs, not by how many git repos we use.

Formally speaking, multi-repo management allows a strict subset of the diffs allowed to a mono-repo (because diffs can 't extend beyond each repo root). Are the excluded possibilities all bad? No. Are they generally bad? Not really. Are they sometimes bad? Sure. Are they sometimes better than many diffs across many repos? Sure. Can a reasonably competent dev team tell the difference? Sure, usually. Unsurprisingly, this usually requires the exact same tooling as ensuring the quality of microrepo changes.

If you're continuously deploying master, have a healthy ci/cd pipeline, and enforce good merging discipline, you're fine either way.

I'm a little tired of doing things like revving our trace and logging libraries across our 50+ micro repos that represent microservices. That's genuinely obnoxious. Is it bad? No. Is it obviously more or less error prone than the equivalent monorepo update? No. All the bad bits of either strategy just require some tooling and a clear head.

Groxx6y ago

So far I've only managed to find one thing that monorepo fundamentally offers that micro does not: atomic commits across projects.

But I'm not sure that's a useful feature anyway:

1) If you are doing a whole-repo refactor (one of the main atomic-commit benefits I see claimed), you still have to run on X -> try to commit X+1. If someone committed in between you may have to redo the whole thing. Or lock the whole monorepo while doing so. Both scenarios seem worse to me for mono, since microrepos stand far less of a chance of conflicting (less frequent commits, less code to consider (faster refactoring tool runs), etc) and a lock would be a far smaller interruption (one repo vs the whole company).

2) Atomic commits don't represent how things are deployed. You still have to deal with version N and N-1 simultaneously. So e.g. breaking refactors of RPC APIs have exactly the same problems in mono vs micro.

On the other hand, downsides are pretty clear and take immense work to sidestep: most tools will either be much slower or not work at all, because they now need to work on 100s or 1000s of times more data than they were developed against. That's probably thousands of man-years of tooling you may have to understand and improve, or wholly replace.

---

The vast majority of monorepo benefits that I usually see claimed are actually tool-standardization benefits. Or "we could build tool X to do that". Or top-level control, like "we can commit for team X". Of course that's useful! But it has nothing to do with monorepo vs microrepo.

Monorepo just happens to be the carrot/stick used to finally achieve standardization. Others could work, this is just the current fad (which, in some ways, is why it sometimes works - it's easier to convince others).

hinkley6y ago

As the project scope and customer base continues to grow, the likelihood that you picked all the correct boundaries within the system drops to zero.

When the right boundaries reveal themselves, you can divide the code up. But who is to say those will still be the right ones in ten years?

If you divide the source code into separate repositories before getting the boundaries right, there's a tremendous amount of friction built into the system preventing the problem from being addressed. Each repository has its own actors, cycles, and version control history, and you break two of those when you start trying to move code across project boundaries. So people just hit things with a hammer or steal functionality (three modules with a function that ostensibly does the same thing but with different bugs).

One of the things I see over and over again is people conflating one repository with one lifecycle. One binary. It's possible to have a monorepo with multiple build artifacts. The first monorepo I ever worked on had 60 build artifacts, and it worked pretty well (the separate artifacts weeded out a lot of circular dependencies).

I can still get inter-version dependency sanity checks with a monorepo. When I am writing new code I can have everything talk to localhost (master@head) or I can have it talk to a shared dev cluster (last labelled version) or some of both, allowing me to test that I haven't created a situation where I can't deploy until I've already deployed.

ChrisCinelli6y ago

I completely agree with your comment.

Monorepos will not save a company from their lack of discipline. But while you can have problems if you do stupid things in a monorepo, you will always have to deal with the dependency hell and what come with it on multirepos.

rezmason6y ago· 4 in thread

Ask your doctor if Monorepo is right for you.

asdfman1236y ago

In the real world, it doesn't work that way.

In the real world you ask your doctor if it's okay and he says "sure" because he's read a few things about it and it's what you seem to want. But he doesn't really know, because no one is capable of understanding the human body in its full complexity.

So you just end up taking Monorepo and hoping it doesn't make you severely depressed or give you seizures that send you to the emergency room.

ghostbrainalpha6y ago

Individual results may vary. Don't take Monorepo if you are bad at programming, or if you may become bad at programming.

Side effects include but are not limited to your repo growing into a single giant ball of circular dependencies.

newshorts6y ago

Monorepo may cause blindness, sleep deprivation and suicidal thoughts. If you experience any of these symptoms, stop taking monorepo and consult your manager.

1 more reply

vidanay6y ago

Ask not what your monorepo can do for you - ask what you can do for your monorepo.

kazinator6y ago· 4 in thread

Let's assume without proof that Glibc developers are good. Similarly, kernel.org developers are good as are the gcc.gnu.org people.

Should glibc, gcc and the kernel be in a monorepo?

(Cue laugh track ...)

sally16206y ago

That is actually one of the advantages of the BSDs. Kernel and base system are in one repo and released together. It simplifies development and testing. and an easier upgrade for end-users.

NikkiA6y ago

This is the red tinged future linux is headed towards anyway.

dboreham6y ago

Did you just invent the unikernel?

teddyh6y ago

AFAIK, Unix did this, so BSD Unix did this, and the modern BSD derivatives still do this.

RHSeeger6y ago· 3 in thread

> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms? You postpone the merge, and keep the fork.

Honestly, it never occurred to me that you're deploy from more than one branch. If you can't merge the branches into <your main branch that releases are built from>, then what's in the branch doesn't make it into a release (from my experience).

larrik6y ago

Seriously, this part threw me hard. Having multiple active forks is not something that's ever been considered as an option anywhere I've worked. Worse, I'm not convinced multi-repo even fixes this issue if you already have a culture that allows multiple active forks.

For instance, big app rewrite, half-new REST API on the backend. Oh, but we need to maintain the old app APIs for those who can't update (like SuperImportantCustomer). Better fork!

rurounijones6y ago

I suppose it depends on your definition of "active" here but having release branches for previous major versions that are still supported with things like backported security fixes etc. is a pretty common setup.

0xEFF6y ago

> Honestly, it never occurred to me that you're deploy from more than one branch.

Release branches! Deploy 1.1 from the 1.x branch on the same day you deploy 2.2 from the 2.x branch. 1.x merges into 2.x which merged into master.

sandGorgon6y ago· 2 in thread

>>>* They don't like to have to clone lots of other repos, and to then worry about their versions (in part because the tooling support for this might be less than great).*

I think this is the argument around which the whole post is made. Everyone does want to work in a small space where they control everything. I want to see git log with just my code commits - so I'll make a microservice out of it.

All other arguments are just there to wrap this one. I think it's wrong.

At an organisational level, a monorepo is more good than bad because it simplifies dependency management and makes for a low-ego team.

humanrebar6y ago

Counterpoint: monorepos make it easier to neglect packaging and dependency management since it's mechanically possible to assume any files in source control are on-hand. This makes it trivial to have implicit dependencies and see the uglier implications of Hyrum's Law.

andrewprock6y ago

It's not that hard to make a library/module, and then type:

    git log .

willeh6y ago· 2 in thread

I don't agree with the author that the quality of the team is what determines if a mono-repo is appropriate or not. It doesn't really matter if you use a mono-repo or not, what matters is what individual engineers are empowered to do.

If engineers or even team-leads don't have permission to create a repo themselves well then you're probably going to see benefits from a monorepo.

At the same time if you have say a ~100 people sharing a repo then you have to make sure that you have tooling that allows each team to customize their building and test environments themselves, which is hard because many CI solutions assume that One Repo = One Build status. Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever. But of course they do, everyone does, and if you have a hundred developers builds will be broken and people's productivity ruined.

Perhaps because we're an industry so prone to failure we keep looking for that one solution, that given a good team makes all problems go away. Agile, XP, Monorepos, Containers, Microservices: we tell ourselves will solve our problems and get those pesky business people off our backs for good. But they won't and never will.

What really matters is enablement, how can we get our code into production doing what it was intended to do without having your toes stepped on all the time. If you design your processes and tooling around enablement not the cargo-cult flavour of the month buzzword invented in the modern beautifully architected, yet completely open office spaces of one of the FAANG companies, then maybe then you can get some actual results.

marcosdumay6y ago

> Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever.

They don't, ever, because the VCS refuses a push if it breaks the build.

That's the problem with all those single-repo discussions. It works perfectly well if you have all the tooling that makes a single repo work like a multi-repo.

And it's great because you can enforce behind the scenes that everything is coherent... Except that you can enforce the same thing on a multi-repo if you write the equivalent tooling. All the points are completely moot, except the one that if you don't have a ton of tooling, a single-repo won't work at all, while a multi-repo will just be not great.

YawningAngel6y ago

I actually tend to find the opposite - with a multi repo, CI/CD is often just plain broken and people waste significant amounts of time cargo culting working setups (and often do so badly or don't keep up with upstream changes).

ThePhysicist6y ago· 2 in thread

Does anyone here have experience using submodules to tie individual repositories together? We've been using this in our small startup (only a few developers) and so far it works nicely. It allows us to check out and develop repositories individually but at the same time maintain an exact dependency graph for our entire system. You can for example have a single repository that ties together different projects and has submodules that contain specific references e.g. to the frontend, backend and deployment repositories. You can then use this master repository e.g. for deployment and integration testing. Gitlab makes CI/CD in such a setup very easy and checks out submodules recursively during a build, even across groups.

The drawback is that not many people are familiar with submodules and they can be a bit tedious to set up, though working with a submodule is almost like working with a normal file in git. One danger is of course that branching between individual submodules can get messy. Another nuisance might be that you have to commit recursively, i.e. if you have one repository with a submodule to which you make changes you need to first commit these changes in the submodule and then create a new commit in the parent repository that adds the new version of the submodule. Maybe this is a good thing though as it forces you to commit changes individually in each submodule before committing a larger change into your main repository. In general I would avoid nesting submodules more than two levels deep, as this can quickly get confusing.

In the past I've also worked on a large mono-repositoriy and enjoyed it as well, just curious to hear if anybody has used submodules in a larger team.

the_gipsy6y ago

I have bad experiences with submodules. It was only one submodule just one level deep, and it only contained something like a convoluted configuration shared among multiple repos.

As you say, getting a change first into the child and then into the parent requires double PRs and testing. But that's the same as if you had it as a package dependency. Only that instead of a version, you have a sha1 which you never know what it is.

I prefer package dependency, because it forces you to explicitly make a release, where it should have passed PR and some integration tests. Also merge conflicts are clearer.

zaphod126y ago

Yes, I have experience and it has always seemed a great idea with a handful of developers and then come back to be a disaster when the team scaled. They become a nightmare to keep everything properly up to date when you have a lot of people working on all of it. Git sub repos worked a bit better, but overall I really recommend submodules exclusively when the submoduled repo changes very infrequently.

mark_l_watson6y ago· 2 in thread

My primary language now is Common Lisp and I have one mother of all mono repos for all of my personal Common Lisp code. I set the root of this repo as a Quicklisp load point so all my libraries and applications are available with a Quicklisp load. Life is good.

I don’t quite get the point of this article though. I am a very enthusiastic but not rockstar programmer, and I don’t have problems with a large mono repo.

gumby6y ago

But you're one person. The article argues that a single repo is unmanageable when you have many devs working at the same time.

drewm19806y ago

He is talking about big teams.

justinmchase6y ago· 1 in thread

Its funny and makes some good points but I don't think the distinction is between good and bad teams but whether or not you're 1 team or more.

If one team has one application that is split across multiple repositories it can be a productivity boost and a simplification to unite them into a single repo with some single tools and norms.

If you have two teams working primarily on two sets of repos and two different systems or applications, by all means split them into two (or more) repos. Just be cause its called a "monorepo" doesn't mean you can't have more than one!

It may be simpler to have one, it may be simpler to have many. Do whats simpler for you! I happen to think that it primarily depends on how your teams are organized more than on who is in the teams or their "badness" levels.

AmericanChopper6y ago

I think this only works if the teams are actually working on different products. If Team A and Team B are both writing to and reading from the same data stores, then I’d say they’re likely both working on the same product, and you have a multirepo.

rosstafarian06y ago· 1 in thread

Is being demeaning and rigid just the cool thing to do today in blogs. I feel like I keep reading articles like this every day now. "Well if your team is full of dumbasses" or "If you use OO instead of functional programming"

NikkiA6y ago

Have you not noticed that this is the new norm everywhere?

wellpast6y ago

> Monorepo is great if you're really good, but absolutely terrible if you're not that good. > Multiple repos, on the other hand, are passable for everyone – they're never great, but they're never truly terrible, either.

The calculus is trickier than this.

He thinks the above is true because of this other thing he says:

> With multiple repos, modularity is the norm.

But if this were true, being "Good" would be easy. I wish programming tools were this able. Then I could go to the pool every day!

But just because you're using some feature of a build or programming system -- like modules or classes or namespaces -- doesn't mean you get the win. Certainly doesn't mean you know how to wield these tools.

In the end the technical feature doesn't save you. You actually have to have a hard-earned skill, which is how to properly modularize code into stable components with narrow stable interfaces and all of that. This skill is very rare ime.

Now back to monorepo vs modules.

If you use modules but you suck at modularization you're going to be paying a huge tax. Because you'll be creating volatile code/interfaces and you'll have to go through a process each change. You will be amplifying the tax from your lack of skill that you wouldn't if you were just in a single monorepo.

On the other hand if you use a monorepo and you suck, you won't experience ^^this^^ pain and you'll be at a much higher probability of staying sucking.

In short, programming language and build features don't bestow skills.

theamk6y ago

That is very, very subjective

> With multiple repos, modularity is the norm. It's not a must - you technically can have a repo depending on umpteen other repos. But your teammates expect to be able to work with their repo with a minimal set of dependencies.

You'd think so.. but no. I am working with multi-repo project where some repos have about dozens of dependencies, all developed locally, and interdependent on each other. Bumping the basic repo is very hard and frustrating. I miss my monorepo every day, where I could just make a PR and fix all consumers at once, where I had a CI which would test all modules at once.

aboodman6y ago

One interesting new thing in this post that I've not seen in this debate is that polyrepos tend to enforce acyclical dependency graphs.

All monorepo projects I've ever worked on enforced the same either through the language involved or mechanically, and it was universally a good thing.

cjfd6y ago

I think one should not underestimate the powers of the not-so-good-team to make anything go sour. Who says the not-so-good-team isn't going to make sure that any change needs to touch at least 3 repos and quite regularly as many as 10?

As for the philosophical issue that yosefk raises, I generally advocate solutions that work for the case that the team is good. I tend to think that if the team is not good you would be cooked anyway. Also, if you raise the stakes people might actually start learning a bit.

hinkley6y ago

> In a Good team, you don't have multiple concurrent branches from which actual product deliveries are produced, and/or where most people get to maintain these branches simultaneously for a long time.

These sorts of Black and White, naked assertions drive me nuts. Buried in this statement is an assumption that the only software model worth even discussing is SaaS software - all copies of the code being run are run by your team, so master@HEAD is our ground truth at all times (except during deployments, which BTW are happening at least 10 minutes out of every hour of every day...)

Teams that sell applications or allow self-hosting, or even some SaaS shops with large enough customers are going to have to maintain multiple release branches. Possibly for years. From personal experience, anything above 3 seems to become unsustainable. But having 3 repos (a monorepo with 2 active branches + master) may be the right answer for you. One can't work, and 100 is murder. Stop the pendulum in the middle.

cj6y ago

> If you agree with the above, the choice is up to your personal philosophy. To me, for instance, it's a no-brainer

The most obvious issue with this post is it fails to acknowledge that for any production scale company, you can't blindly say that a decision like this is a choice of personal philosophy (unless you're starting a brand new project from scratch, in which case, spending tons of time structuring repos well probably isn't your first priority since new projects have very little code).

I'd love to see more articles that discuss repo structure in the context of a pre-existing codebase with hundreds of thousands of lines of production code and 10+ engineers collaborating on it.

For anyone reading: I'd be interested to hear anecdotes from people working at companies that have successfully (or unsuccessfully) re-structured a monorepo, the reasons you did it, how much time was invested in the restructure, and whether you think it was a net positive long-term.

reilly30006y ago

I think the whole idea of simplicity vs complexity, and the linked article "Worse is Better" is fundamentally ill-conceived. Simple vs. Complex is contextually relative. Literally every observer is subject to their own opinion of a design. Success in the wild = success. I think success in the wild, especially in our age of information overload, depends on being "convenient to understand and operate".

A 'simple' solution may not be easy to grok. Few people think at the level of axioms.

A survivable solution must be passed along to many people across generations.

For some, monorepos are simple because that is what they know, for others multirepos are the norm. The survival of the firms that adopt these strategies will somewhat dictate what repo strategy propagates in the world, not the best design.

sally16206y ago

What author misses is that multi-repo is also not dumbass proof. Our team of about 60 people owned about 5-6 git repos that compiled into one SO. Every simple bug fix required multiple diffs to multiple repos.

The real argument that the author is making is that worse is better. Monorepo requires good tooling and a disciplined team. Multi-repo is worse but it is easier to manage when dealing with inexperienced programmers who want to have their own repo for their shiny little microservice.

IMO, the difference is that multi-repo has limitations and mono-repo has challenges. You will never get atomic commits and precise versioning in multi-repo. With mono-repo, there are a lot of challenges that can be solved with good engineering.

burtonator6y ago

One thing that's nice about a monorepo is if the language has modules. It it supports modules well you can isolate your code better but still keep it in one large repo.

I dislike having all my code under one 'src' directory and it's nice to have modules like foo-ui and foo-util and so forth. Knowing that one module doesn't use ui components is nice because you can use it on the backend for example.

But it's all fully integrated and tested together.

dboreham6y ago

Always makes me think of a philips radio my Dad told me about called the "Mono Knob". It had a complex design that allowed control via just one knob. He said they were always in the shop for repair when he worked as a teenager in a radio store.

https://www.thevalvepage.com/radios/philips/785ax/785ax.htm

hinkley6y ago

One of the tenets of XP that survived (or should I say, is amplified?) into CI/CD is the idea that you should build up callouses for painful activities instead of trying to avoid them.

In that context, if a thing is tough but has value, you make a path to it. First make the tools consistent, and then make people consistently use the tools. The more predictable the system becomes (predictability is the opposite of magic!), the more you insist on people using it. Pushback is a kind of feedback, and you have to address at least some of the concerns of people who refuse ('meet me halfway here').

Someday it will shock no-one to say that Git is not the best of all possible version control tools. If this is difficult, it may not be the people. Maybe it's time to start thinking about the next version control system?

SVN had some pretty decent facilities for monorepos. Some people will tell you that Git traded some of these features for others, but looking through the information architecture documentation for git, I don't think I can agree. Some of that information is there, it's just maybe not packaged for consumption.

jayd166y ago

>bad fork

How is this an argument? You merge the unforked projects and its the equivalent to multirepo. I don't follow the argument at all.

...I started writing rebuttals to the others but I guess when your argument is "yeah you can do it right but you _could_ do it wrong and I have defined the question in such a way that we err on success for multi-repo and err on failure on mono-repo" I can't really fight that.

scarejunba6y ago

You're just pushing your trouble into a shadow layer that's hidden. You still have just as much complexity. It's just not encoded.

If branch A on repo X will only work with branch B on repo Y, you're holding that relationship in an uncoded way. It's true and unrepresented, and you never want that.

galkk6y ago

I've experienced 3 large codebases with different approaches:

1. Monorepo: Google3

2. Non monorepo, large pile of #@$: Microsoft Exchange

3. Non monorepo, Amazon

After working with them, from my personal experience, monorepo was the best. Yes, Google has ton of internal stuff and they could go away with not using much of external dependencies, but when everything works, it works like a charm. Convenience of defining protobufs/contracts, ease of reference them and ton of the things are given to you when you're in the system.

At Google I never felt that the system is hostile to you. It was extremely easy to start hacking something if you'd like to. Yes, it's not only monorepo, but the overall quality of the tools available, but monorepo is also quite a significant part of it.

yongjik6y ago

> With multiple repos, modularity is the norm. (...) With a monorepo, modularity is a mere ideal.

> ...

> In a not-so-good team, your monorepo will grow into a single giant ball of circular dependencies.

Sounds like the author was lucky enough to not encounter a not-so-good team with multiple repos.... :D

jsw6y ago

I’m firmly on team monorepo, and I agree great tooling is an absolute requirement. We use Bazel + AWS CodeBuild with local caching. We have an average incremental CI build of our monorepo that’s under 45 seconds. Clean build 30+ minutes.

jupp0r6y ago

There is also a way to simultaneously reap the drawbacks of monorepos and multirepos: consolidate numerous small repos into somewhat big monorepos and then don't merge everything into one big repo.

This way, you'll have to deal with

* difficult tooling

* dependency hell between the mono repos (which are now way more tightly coupled due to the dependency graph between them being denser)

* long living branches causing way more collateral damage as described in the original article

* cross-repo changes have become even harder for all the reasons above

You get all the bad things and avoid those advantages! Welcome to my world :)

lootsauce6y ago

I feel like I’m missing something. Never used monorepo before but looking into it for our node code base. It seems like the combination of a monorepo and individually publishing the packages to a private npm server such as nexus eliminates many of these issues. One project can remain using an older version while another can use a breaking change. Pin your versions and follow semver and you should be fine. What am I missing here?

tudelo6y ago

Coming from a small company with a monorepo to a large team with a monorepo, it seems to work fine. But maybe I need to be exposed more to other workflows.

ar_lan6y ago

I find this article more compelling than Yossi's article: http://blog.shippable.com/our-journey-to-microservices-and-a...

Yossi's comes off extremely pretentious, without really explaining why people consider multi-repo projects to begin with.

TheGRS6y ago

There's some nuggets of good things to look out for in here, but "don't do monorepo because your team is full of dumbasses" feels like a useless argument, maybe even a little adversarial. Monorepos solve versioning problems very effectively, which would be a pro if you're worried about working with dumbasses.

dmitriz6y ago

> like Google, the ultimate force for Good in technology

Hahaha...

https://www.reddit.com/r/degoogle/

https://twitter.com/hashtag/degoogle

dmitriz6y ago

> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms?

Stop it and focus on one single web app instead that anyone can run. Unless your app is going to be in the top 10 that people can't miss downloading.

epage6y ago

This reminds me of some the trade-offs I've looked at with monorepos: https://epage.github.io/dev/monorepos/

drudru116y ago

So glad Yosef keeps writing on his blog. This is a well articulated gem. I had intuition as to why the monorepo style was not great for most shops. His clever argument nails it.

j / k navigate · click thread line to collapse

151 comments

114 comments · 40 top-level

cbanek6y ago· 13 in thread

> the entire code base got forked, and the entire org is now held hostage by the dumbass.

> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

jerf6y ago

The middle ground is vast, and nuanced in many dimensions. Which is a good thing, because there sure aren't very many large, good teams (by this definition of "good").

cbanek6y ago

mrfredward6y ago

> repo A branch B can only be built with repo C branch F, but all the other repos should be master

0xEFF6y ago

All modern dependency management tools solve this problem by allowing dependencies to be specified as URI's.

1 more reply

thaumasiotes6y ago

>> Of course in a Good team, needless dependencies would be weeded out in code reviews, and a Culture would evolve over time avoiding needless dependencies.

Heh. This reminded me of a different story, which I remember vaguely enough that I'll paraphrase from memory:

> "The Excel team will never go for it. Their motto is 'Find the dependencies... and eliminate them.'"

> This probably explained why the Excel team had its own C compiler.

hinkley6y ago

Like a lot of things in a Good Team, dependency curation requires a tempo. This will be done, and putting it off doesn't get you out of doing it.

mixmastamyk6y ago

Sounds like Joel on Software.

1 more reply

GauntletWizard6y ago

Tooling CI into a monorepo can be nasty, too. Do I update my staging deployment for every job on every commit? You can slow down deployments pretty fast, too, and make integration a real pain.

humanrebar6y ago

Exactly. You have the same problem in a monorepo since the problem is validating a change actually works without breaking an unknown number of other things indirectly.

There is a little bit of a novel problem in correlating N feature branches and cloning them, but it's not that much more complicated than correlating N subprojects in a monorepo.

pklausler6y ago

Observation: the quality of the work done by a team is too often a MIN or PRODUCT over the members of the team, not a SUM.

hinkley6y ago

One of the concepts being pushed by some in the Lean community is the idea that some activities need to be bounded at all times.

humbleMouse6y ago

Umm, appoint someone or a trusted few as git admins and only allow them to merge commits?

(responding to the article contents quoted in above comment)

grenoire6y ago

I imagine this would add a lot overhead when the git admins are not well-versed in all subprojects of the monorepo.

jasim6y ago· 13 in thread

bartread6y ago

> "...The simplicity you try to reach is a false simplicity."

Also applies to some microservice architectures I've seen. People completely disregard the complexity (and overhead!) of the interactions between microservices.

asdfman1236y ago

I feel like I've become a crusader against microservices for the same reason.

1 more reply

jayd166y ago

1 more reply

gotofritz6y ago

Micro services are a different kettle of fish though

Firstly, the benefits they offer have often little to do with the architecture itself, but with the bigger picture (separating teams, CIs, allowing different stacks, managing costs, scaling, etc).

So what you say doesn't necessarily hold

vinay_ys6y ago

Does anyone here do Microservices well? And keep them in a monorepo?

1 more reply

justinmchase6y ago

Exactly, they each have a maintenance cost which isn't shared when they're all separated.

raz32dust6y ago

truncate6y ago

detaro6y ago

1 more reply

SomeOldThrow6y ago

> The theory behind the microkernel is that operating systems are complicated.

Seems like a blatant straw man to me.

sorbits6y ago

So I agree with you that Linus is presenting a straw man and your comment shouldn’t have been downvoted.

3 more replies

jayd166y ago

Its only a false simplicity if you still need to track the interaction between everything and everything else.

Sure the system is as or possibly slightly more complex, but there is a distinct advantage to reducing the peak complexity of any given sub-problem.

ascertain6y ago

DanFeldman6y ago· 8 in thread

We've gone from ~10-15 actively maintained repos to about 3 as we're slowly migrating. We have a mix of services, libraries, and batch processing all mixed in.

These articles were absolutely crucial to developing our monorepo.

https://trunkbaseddevelopment.com/

http://blog.shippable.com/ci/cd-of-microservices-using-mono-...

https://www.godaddy.com/engineering/2018/06/05/cicd-best-pra...

atq21196y ago

> My team just switched to a monorepo.

I feel like this discussion is missing an appreciation for size/scope of repositories vs. size/scope of the organisation developing that software, with a pinch of appreciation for Conway's law.

Think of it this way: the Linux kernel is certainly a larger project, but nobody thinks of it as a monorepo. Same thing goes for major software projects like Qt.

joshschreuder6y ago

How do you handle building changes to just one of those projects? Can Jenkins do that (easily)?

kevan6y ago

It's definitely important to consider before jumping in. Going from 5m to 50m compile times would be a major issue for me.

2 more replies

DanFeldman6y ago

We have plans to use Bazel in the future, but you have to boil the ocean when moving to bazel and get everything ever inside bazel before you get any benefit out of it.

Jenkins can't do it "easily" but it definitely can. I'd be happy to share our Jenkinsfile if you'd like.

Our finding of changes is something like:

#!/bin/bash set -euxo pipefail

COMPARE_BRANCH=$1

MERGE_BASE=`git merge-base $COMPARE_BRANCH HEAD` FILES_CHANGED=$(git diff --name-only $MERGE_BASE | grep '/') echo ${FILES_CHANGED} | xargs dirname | cut -d "/" -f 1 | sort | uniq

[1] blog.shippable.com/ci/cd-of-microservices-using-mono-repos

Too6y ago

If you have to do make clean in a monorepo you are pretty much toast. Tooling for impact detection and reliable makefiles that always succeed incremental builds is absolutely crucial.

humanrebar6y ago

I suspect for that number of projects monorepos make a lot of sense.

The major technology organizations we hear about usually have at least several monorepos, due to the legacies of acquisitions and mergers if nothing else.

shados6y ago

Going from 15 to 3 is definitely a different discussion from going from 7000 to < 5.

umanwizard6y ago

FWIW at least 95% (anecdotally) of Facebook’s main code is in two gigantic monorepos: fbsource and www. (The other major repos are for configuration-related stuff).

Last I heard there were plans to move www into fbsource.

There are certainly not random dependencies on public GitHub pages. Everything is versioned.

There is a mind boggling amount of custom tooling to make this work.

1 more reply

qznc6y ago· 8 in thread

Does anybody know some monorepo horror stories?

mr_tristan6y ago

Oh yes. #1 problem I've experienced at multiple places: delaying integration with tons of branching.

beirut_bootleg6y ago

the_gipsy6y ago

Or you could just tell them to stay off versioning/release/branching discussions, even in a monorepo. All they need to know is what release/deploy has what features, and when the release happens.

1 more reply

flukus6y ago

Not horror stories, but I've used mono repos in a company with 50+ projects and the results tend to be tight coupling, macro level spaghetti and libraries being left to atrophy due to fear of change.

Too6y ago

Sounds like a feature and not a bug. It prevents irresponsible breaking changes of lib A interface and just hoping that some other Z team will clean up after you.

1 more reply

dwohnitmok6y ago

Just as a bad abstraction is worse than no abstraction, I believe bad modularity is worse than no modularity.

aidenn06y ago

I asked a similar question a previous time the mono- vs many- repo question came out, and the few responses I got were roughly "The repo became many tens of gigabytes which was unwieldy"

andrewprock6y ago

Unfortunately, breaking a 10 gig repo into 5 repos often means you have to now download five 2 gig repos.

1 more reply

JamesBarney6y ago· 7 in thread

Ugh I don't like his use of the word dumbass. Coding is hard, it's easy to break things. We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

CannisterFlux6y ago

lmeyerov6y ago

Re:Branching, having lived this, monorepos are very much a blessing, not a curse.

If software is big enough to force considering monorepos, then cross-cutting dependencies will happen, and then one diffable long-lived monobranch is much better than a bunch of interleaved ones.

Incrementally building, landing, etc., cross-cutting deps becomes much less of a slog. Ex: skip concerns about versioning of _internal_ APIs.

bartread6y ago

> cross repo edits were painful.

We're a bit over halfway there (the most painful half) and I expect us to get the rest done sometime in the next 3 months or so. Not looking back. At all.

madhadron6y ago

> There are few organizations that have so much code they break available source control solutions but simultaneously don't have the technical expertise to manage a monorepo that large.

tiglionabbit6y ago

Not sure what you're implying about mononoke but the readme says:

> The version that we provide on GitHub does not build yet.

So, maybe eventually. They found that Git didn't do well at their scale, so they modified mercurial instead.

I'm all in favor of a company-wide mono-repo if it doesn't have scaling issues.

1 more reply

humanrebar6y ago

> In most cases I'd trade a little bit of modularity for increased domain logic consistency

In the case of several repos, you will notice the extra work needed to pull in the extra project. In the case of a monorepo... it might look like any benign change.

hashkb6y ago

> We've developed lots of tools of strategies so we don't have to rely on people getting it right every time.

I think one of the points of the article is that this isn't (sufficiently) true; people will use tools incorrectly and make excuses for deviating from the strategy often enough that it's a problem.

> Ugh I don't like his use of the word dumbass

Substitute "well-meaning person who makes a totally understandable process mistake"

1 more reply

jnurmine6y ago· 4 in thread

Is the monorepo/multirepo choice really the most important thing to consider?

Why are deliveries done from a branch which is obviously still in development? Why does code-to-be-released need to depend on incomplete work? Why aren't something like "topic branches" used?

jknoepfler6y ago

If you're continuously deploying master, have a healthy ci/cd pipeline, and enforce good merging discipline, you're fine either way.

Groxx6y ago

So far I've only managed to find one thing that monorepo fundamentally offers that micro does not: atomic commits across projects.

But I'm not sure that's a useful feature anyway:

---

hinkley6y ago

As the project scope and customer base continues to grow, the likelihood that you picked all the correct boundaries within the system drops to zero.

When the right boundaries reveal themselves, you can divide the code up. But who is to say those will still be the right ones in ten years?

ChrisCinelli6y ago

I completely agree with your comment.

rezmason6y ago· 4 in thread

Ask your doctor if Monorepo is right for you.

asdfman1236y ago

In the real world, it doesn't work that way.

So you just end up taking Monorepo and hoping it doesn't make you severely depressed or give you seizures that send you to the emergency room.

ghostbrainalpha6y ago

Individual results may vary. Don't take Monorepo if you are bad at programming, or if you may become bad at programming.

Side effects include but are not limited to your repo growing into a single giant ball of circular dependencies.

newshorts6y ago

Monorepo may cause blindness, sleep deprivation and suicidal thoughts. If you experience any of these symptoms, stop taking monorepo and consult your manager.

1 more reply

vidanay6y ago

Ask not what your monorepo can do for you - ask what you can do for your monorepo.

kazinator6y ago· 4 in thread

Let's assume without proof that Glibc developers are good. Similarly, kernel.org developers are good as are the gcc.gnu.org people.

Should glibc, gcc and the kernel be in a monorepo?

(Cue laugh track ...)

sally16206y ago

That is actually one of the advantages of the BSDs. Kernel and base system are in one repo and released together. It simplifies development and testing. and an easier upgrade for end-users.

NikkiA6y ago

This is the red tinged future linux is headed towards anyway.

dboreham6y ago

Did you just invent the unikernel?

teddyh6y ago

AFAIK, Unix did this, so BSD Unix did this, and the modern BSD derivatives still do this.

RHSeeger6y ago· 3 in thread

> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms? You postpone the merge, and keep the fork.

larrik6y ago

For instance, big app rewrite, half-new REST API on the backend. Oh, but we need to maintain the old app APIs for those who can't update (like SuperImportantCustomer). Better fork!

rurounijones6y ago

0xEFF6y ago

> Honestly, it never occurred to me that you're deploy from more than one branch.

Release branches! Deploy 1.1 from the 1.x branch on the same day you deploy 2.2 from the 2.x branch. 1.x merges into 2.x which merged into master.

sandGorgon6y ago· 2 in thread

>>>* They don't like to have to clone lots of other repos, and to then worry about their versions (in part because the tooling support for this might be less than great).*

All other arguments are just there to wrap this one. I think it's wrong.

At an organisational level, a monorepo is more good than bad because it simplifies dependency management and makes for a low-ego team.

humanrebar6y ago

andrewprock6y ago

It's not that hard to make a library/module, and then type:

    git log .

willeh6y ago· 2 in thread

If engineers or even team-leads don't have permission to create a repo themselves well then you're probably going to see benefits from a monorepo.

marcosdumay6y ago

> Implicit in the author's reasoning is the principle that good engineers don't make mistakes; they don't break the build, not ever.

They don't, ever, because the VCS refuses a push if it breaks the build.

That's the problem with all those single-repo discussions. It works perfectly well if you have all the tooling that makes a single repo work like a multi-repo.

YawningAngel6y ago

ThePhysicist6y ago· 2 in thread

In the past I've also worked on a large mono-repositoriy and enjoyed it as well, just curious to hear if anybody has used submodules in a larger team.

the_gipsy6y ago

I have bad experiences with submodules. It was only one submodule just one level deep, and it only contained something like a convoluted configuration shared among multiple repos.

I prefer package dependency, because it forces you to explicitly make a release, where it should have passed PR and some integration tests. Also merge conflicts are clearer.

zaphod126y ago

mark_l_watson6y ago· 2 in thread

I don’t quite get the point of this article though. I am a very enthusiastic but not rockstar programmer, and I don’t have problems with a large mono repo.

gumby6y ago

But you're one person. The article argues that a single repo is unmanageable when you have many devs working at the same time.

drewm19806y ago

He is talking about big teams.

justinmchase6y ago· 1 in thread

Its funny and makes some good points but I don't think the distinction is between good and bad teams but whether or not you're 1 team or more.

If one team has one application that is split across multiple repositories it can be a productivity boost and a simplification to unite them into a single repo with some single tools and norms.

AmericanChopper6y ago

rosstafarian06y ago· 1 in thread

NikkiA6y ago

Have you not noticed that this is the new norm everywhere?

wellpast6y ago

The calculus is trickier than this.

He thinks the above is true because of this other thing he says:

> With multiple repos, modularity is the norm.

But if this were true, being "Good" would be easy. I wish programming tools were this able. Then I could go to the pool every day!

Now back to monorepo vs modules.

On the other hand if you use a monorepo and you suck, you won't experience ^^this^^ pain and you'll be at a much higher probability of staying sucking.

In short, programming language and build features don't bestow skills.

theamk6y ago

That is very, very subjective

aboodman6y ago

One interesting new thing in this post that I've not seen in this debate is that polyrepos tend to enforce acyclical dependency graphs.

All monorepo projects I've ever worked on enforced the same either through the language involved or mechanically, and it was universally a good thing.

cjfd6y ago

hinkley6y ago

cj6y ago

> If you agree with the above, the choice is up to your personal philosophy. To me, for instance, it's a no-brainer

I'd love to see more articles that discuss repo structure in the context of a pre-existing codebase with hundreds of thousands of lines of production code and 10+ engineers collaborating on it.

reilly30006y ago

A 'simple' solution may not be easy to grok. Few people think at the level of axioms.

A survivable solution must be passed along to many people across generations.

sally16206y ago

burtonator6y ago

One thing that's nice about a monorepo is if the language has modules. It it supports modules well you can isolate your code better but still keep it in one large repo.

But it's all fully integrated and tested together.

dboreham6y ago

https://www.thevalvepage.com/radios/philips/785ax/785ax.htm

hinkley6y ago

One of the tenets of XP that survived (or should I say, is amplified?) into CI/CD is the idea that you should build up callouses for painful activities instead of trying to avoid them.

jayd166y ago

>bad fork

How is this an argument? You merge the unforked projects and its the equivalent to multirepo. I don't follow the argument at all.

scarejunba6y ago

You're just pushing your trouble into a shadow layer that's hidden. You still have just as much complexity. It's just not encoded.

If branch A on repo X will only work with branch B on repo Y, you're holding that relationship in an uncoded way. It's true and unrepresented, and you never want that.

galkk6y ago

I've experienced 3 large codebases with different approaches:

1. Monorepo: Google3

2. Non monorepo, large pile of #@$: Microsoft Exchange

3. Non monorepo, Amazon

yongjik6y ago

> With multiple repos, modularity is the norm. (...) With a monorepo, modularity is a mere ideal.

> ...

> In a not-so-good team, your monorepo will grow into a single giant ball of circular dependencies.

Sounds like the author was lucky enough to not encounter a not-so-good team with multiple repos.... :D

jsw6y ago

jupp0r6y ago

There is also a way to simultaneously reap the drawbacks of monorepos and multirepos: consolidate numerous small repos into somewhat big monorepos and then don't merge everything into one big repo.

This way, you'll have to deal with

* difficult tooling

* dependency hell between the mono repos (which are now way more tightly coupled due to the dependency graph between them being denser)

* long living branches causing way more collateral damage as described in the original article

* cross-repo changes have become even harder for all the reasons above

You get all the bad things and avoid those advantages! Welcome to my world :)

lootsauce6y ago

tudelo6y ago

Coming from a small company with a monorepo to a large team with a monorepo, it seems to work fine. But maybe I need to be exposed more to other workflows.

ar_lan6y ago

I find this article more compelling than Yossi's article: http://blog.shippable.com/our-journey-to-microservices-and-a...

Yossi's comes off extremely pretentious, without really explaining why people consider multi-repo projects to begin with.

TheGRS6y ago

dmitriz6y ago

> like Google, the ultimate force for Good in technology

Hahaha...

https://www.reddit.com/r/degoogle/

https://twitter.com/hashtag/degoogle

dmitriz6y ago

> What do you do when you have a branch working on Android and another branch working on iOS and you have deliveries on both platforms?

Stop it and focus on one single web app instead that anyone can run. Unless your app is going to be in the top 10 that people can't miss downloading.

epage6y ago

This reminds me of some the trade-offs I've looked at with monorepos: https://epage.github.io/dev/monorepos/

drudru116y ago

So glad Yosef keeps writing on his blog. This is a well articulated gem. I had intuition as to why the monorepo style was not great for most shops. His clever argument nails it.

j / k navigate · click thread line to collapse