Advantages of Monorepos (2015) (opens in new tab)

(danluu.com)

183 pointsNaac4y ago140 comments

140 comments

106 comments · 30 top-level

lisper4y ago· 16 in thread

It's very simple: with a monorepo you always have access to everything you need, together with a ton of stuff you don't. Whether or not this is advantageous boils down to whether the cost of not having access to something you need is greater than the cost of having access to a bunch of stuff you don't. As long as your system is reasonably efficient at letting you select small subsets of everything you could potentially have access to, the cost of having access to a bunch of stuff you don't need is essentially zero. Perforce is good at that. Git isn't. So people who use Perforce tend to think that monorepos are good and people who use git don't. And they're both right.

lhorie4y ago

I don't think version control system is the major differentiator for whether people like monorepos or not, tbh. Having a good incremental build/test system is far more important to developer experience, IMHO.

The biggest dissonance when it comes to the purported benefits of monorepos is that a "good" monorepo generally assumes very good interface design skills across all teams, but in reality, the path of least resistance is tacking on more and more unique codepaths (e.g. forking/"rewriting" existing things), so in effect, likability often comes down to how well a team is able to isolate itself from global changes (by choosing stable/boring APIs, inventing their own abstractions in their own little corner, or what have you).

dylan-m4y ago

> Having a good incremental build/test system is far more important to developer experience, IMHO.

This is very true. Used poorly, monorepos are a crutch which allow a team to pretend that stable interfaces, versioning, and boundaries don't matter. Sure, your team can (theoretically) build the universe from a single git clone. Now what happens when another team needs to deal with your mess? What happens when you add some external dependency and now you have to deal with all of those problems anyway?

[You also shouldn't use git submodules to solve this, because that's basically the same thing but with the added annoyance of git. You should publish your bloody packages. With version numbers. And changelogs. Real version numbers. Real changelogs. Written by humans].

The author mentions the complexity barrier in open source, and I think that's a really interesting observation, but at the same time I think that complexity is the reason free software is alive today. It is definitely overwhelming for newcomers when a project requires a whole bunch of specific pieces that are all from different places. But once you've gotten past that, collaboration between a diverse range of people and organizations becomes an obvious and practical thing instead of a major undertaking. People don't all go off and write their own things from scratch[1], or clone code from place to place because it's too annoying to reuse it properly. Something internal feels similar to something external, which reinforces collective ownership.

Consider that Chromium includes its own everything, takes half a day to build from source, and is decidedly not a community-run project. Debian, meanwhile, is the polar opposite of a monorepo and continues to be alive and well without the oppressive shadow of a single 600 ton schizophrenic gorilla.

I think a lot of the time a team just wants a monorepo because they want a one-stop shop for fetching and building all of the things because internal dependencies are difficult. If that is the case, I think it's always worth considering something like BuildStream. It lets you specify where things are and how to build them, and it provides some useful tools on top of that. It doesn't solve brute-forcing a change across multitudes of applications, but it lowers the barrier to entry, it forces developers to care about deployment once in a while, and it can certainly help you to spot the integration issues when you change an interface without telling anyone.

[1] People will laugh at me for saying that from an operating known for having more window managers than there are text editors, but really, have you seen some proprietary software projects?

throwaway8943454y ago

Honestly I would 100% do a monorepo every single time if there was good tooling for incrementally building and testing libraries. Having to rebuild every image from scratch for every single change scales miserably. Things like Bazel exist, but you basically have to have a team dedicated to operating it (maybe the difficulty varies by language, but it was a major pain when I tried to use it to build some relatively simple Python projects a few years ago).

sayrer4y ago

This isn't really true anymore, in my experience. I've used Bazel with teams of 30-50 and no full-time maintainer, let alone a team.

Once the migration is done, all you need is a few people that do some Bazel gardening every few weeks, and it's certainly not a full time job. This can be someone that does operations (CI, deployments, etc) or a product/infrastructure engineer, or one of each. Github / Gitlab scale to all but the largest projects, and even then, you can just split into two or three "monorepos" and kick the can down the road. With things like BuildBuddy, it's even easier.

As the article states, there are a lot of little of hidden costs and paper cuts when using a many-repo layout. The one that I've seen that's most prevalent is that it obscures copy/paste behavior, since it's much more difficult to detect in a many-repo setup.

Going to Bazel or equivalent is a bit of a mind adjustment, and some languages are better supported than others, but it really starts to pay off in larger projects. Especially if there's more than a few languages in use.

klodolph4y ago

I have personally run converted build systems to Bazel, and use it for personal projects as well.

Bazel 1.0 was released in October 2019. If you were using it "a few years ago", I'm guessing you were using a pre-1.0 version. There's not some cutoff where Bazel magically got easy to use, and I still wouldn't describe it as "easy", but the problem it solves is hard to solve well, and the community support for Bazel has gotten a lot better over the past years.

https://github.com/bazelbuild/rules_python

The difficulty and complexity of using Bazel is highly variable. I've seen some projects where using Bazel is just super simple and easy, and some projects where using Bazel required a massive effort (custom toolchains and the like).

1 more reply

AtlasBarfed4y ago

Is there, say, IntelliJ support for Bazel? Do you need a central server?

I've heard bazel is a bear...

But... all mature build systems are, because they become essentially enterprise workflow engines, process execution engines, internal systems integration hubs, and schedulers. Why? Because that's what an enterprise/mature build system is, it only differs from other software packages with the same capabilities in that it concentrates on build / deploy / CI "business tasks".

My current employer uses Jenkins (which has workflows/pipelines, daemons) and then feeds into Spinnaker (which has a full DAG workflow engine and interface) and likely this is pretty close to a "standard" or "best of breed" cloud build CI system. Of course there is a dedicated team.

Oh and of course the gradle code build in github has its own pipelines and general Turing machine to do whatever you want.

3 more replies

lupire4y ago

> incrementally building and testing libraries.

Like Make?

1 more reply

wilgertvelinga4y ago

Did you hear about nx.dev?

anon23anon4y ago

feel like the root problem companies run into when you don't have monorepo is shit gets locked down - e.g. I didn't even know this repo existed b/c I couldn't see it/clone it b/c of permissions. the other thing is lets say we have microservices - now I need to call your service - and most places are terrible at documenting things - especially if it's a new service which it probably is I'm trying to connect w/ it for the first time - now I have to figure out how to call your service - I can do that on my own my cloning your project and reading the code and bugging you but I'm way more less likely to bug you if it's part of the mono and I just need to open the code in the existing probject. I think this leads to a second point is mono does lead to more consistency and better knowledge sharing across codebases.

cryptonector4y ago

It's more than that. When you have to make changes that touch a lot of dependencies, it's much easier if all those dependencies

  - are in the same repo (making it easy to
    find and change all of them)

  - are in the same universe of build/test/deploy
    services (making integration of your changes
    atomic)

Atomicity of integration is essential, especially in organizations that move fast and make lots of breaking interface changes. Where it's to make a breaking interface change, it will be OK IFF you can make that change atomic.

Conversely, if you want to be able to make breaking interface changes, the integration and deployment of those has to be atomic.

Not having a monorepo & monobuild means that you have to have stringent interface backwards-compatibility commitments. That's fine if you're shipping an operating system, say, but it's usually too painful if you're not shipping anything to third parties.

For me, the atomicity feature is the killer feature of monorepos.

lisper4y ago

But you can never have true atomicity like that unless you pull in all of the source for all of your dependencies. That means, for most people, the Linux kernel and the standard gnu libraries and utilities. That's a lot of source code. And then you have to maintain all of those. If you're Google, you can do that. If you're a startup, probably not so much.

2 more replies

8note4y ago

Deploying atomic changes is much harder than writing them. having a host be updated atomically doesn't mean everything it communicates with has gotten the same change

1 more reply

parentheses4y ago

This is massively oversimplified.

- the cost of having access to more than you need: cognitive load and tooling for filtering, larger repositories require more tooling work to be performant

- there's also the atomicity of change and past changes which one can see/understand

lisper4y ago

How is that different from what I said?

mikepurvis4y ago

Years ago, Google had a gcheckout tool that would trace the dependency information for whatever project you were working on, and then selectively grab the portions of the monorepo that you were going to need for it. Maybe they still have that or it's evolved into something else; I dunno, I haven't been there in a really long time.

Anyway, it seemed like such an obvious complement to the the perforce/monorepo style of working that I came away surprised that perforce wouldn't have hoisted such a thing into their product as a first-class feature. Tracing dependencies across a lot of different build systems is obviously not trivial, but it's not intractable, particularly if the tool is pluggable so that orgs can provide modules to handle their own particular approach.

jeffbee4y ago

What you are describing is an artifact of the old Perforce which copied everything in your client to local storage. After the conversion to srcfs and piper, which was more than a decade ago, this became unnecessary.

http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...

oceanplexian4y ago· 11 in thread

I know some of the FAANGs do monorepo (Google being the biggest) but AWS does not.

A monorepo is an organizational mess when trying to manage and transfer ownership across thousands of teams, contain the blast radius of changes, unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories. And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.

It perplexes me people advocate for this strategy. All I can think is it’s another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).

throwaway8943454y ago

Not having to submit and coordinate PRs across a dozen repos is a pretty tangible benefit.

> unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories.

Agree that there's a bunch of tooling needed to operate a monorepo, but there's also a bunch of tooling to sanely manage dozens of "microrepos" as well (when an upstream library changes, update downstream libraries' dependency manifest files to take the new version, run tests, report errors back to upstream, etc). I don't know of any open source tools that manage this problem, but I'm guessing they aren't high-quality if only due to the complex nature of the problem space.

> And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.

Because the tooling sucks, as previously mentioned. Many changes require touching many repos, which means coordinating many pull requests and manually changing dependency manifest files and so on.

Ultimately, the "repo" concept is limiting. We need tooling that is aware of dependencies one way or another, and sadly all such open source tooling sucks whether it assumes the relevant code lives in a single repo or across many repos.

oceanplexian4y ago

> when an upstream library changes, update downstream libraries' dependency manifest files

As someone who's more systems oriented, ideally projects are locked in to a specific versioned dependency, and nothing changes unless a developer of a project explicitly asks for it.

What I've seen is the opposite, someone owns a dependency and is lazy and and wants to perform a breaking operation, and rather than version the change or orchestrate a backwards compatible change, they use mono-repos to "solve" the problem. IMHO it's a bad pattern and leads to a lot of risk.

1 more reply

erik_seaberg4y ago

> when an upstream library changes, update downstream libraries' dependency manifest files

This needs to happen periodically, when we have slack. Doing it continuously adds risks that aren’t really our job to take.

1 more reply

giaour4y ago

Amazon has Brazil versionsets and workspaces, which solve many of the same problems monorepos do. I really liked the way those extra resources let you organize "ad-hoc monorepos" from smaller repositories, but it's infrastructure that I haven't seen elsewhere.

Since leaving Amazon, I've mostly worked with monorepos and wouldn't go back to multirepos without Brazil-style tooling.

ghoward4y ago

Hey, I'm hoping you can answer some questions for me.

I'm building a build system and a VCS (separately). I want to do it right.

Could you explain to me what Brazil is? Is it the build system? [1] Or is it the VCS that Amazon uses?

If it is the build system, then it appears that versionsets are literally just a list of dependencies with their versions to use for a build. Is that correct? If not, or if you can give me more detail, what are versionsets, exactly?

Also, what are workspaces? Does this quote from one of the comments on the link match?

> A workspace consisted of a version set to track and any packages that were checked out.

[1]: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6...

2 more replies

dastbe4y ago

very much this. there's also a cultural understanding that libraries/fat clients are equal or worse in terms of maintainability compared to services. equal in the sense that you have to treat them like a service, worse because it's easier to mess up and you don't get to control your rollout strategy.

EDIT: Though while most teams have gone towards many smaller packages for their applications, I suspect that most would be better served by a team level monorepo. That gets you all of the benefits of monorepo locally and all of the benefits of manyrepo globally, and unless your project hits the ~200+ developer mark maintaining things will be stay tractable.

radicality4y ago

* Why is ownership management a mess in a monorepo? Can just decide “this team owns this folder hierarchy” etc. * ‘Contain blast radius of changes’ - is that actually difficult? Isn’t there tooling that figures out what changed and what dependencies need rebuilding? (eg Facebook buck)

For context, I was for a very long time at FB so am definitely used to the monorepo way, and recently switched to place which uses github + many repos, and it feels so much worse.

Honest question - how do you actually effectively share code between many repos? Example: How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it? It should be a compile/buildtime error for the other project, but how does that work if everything is in its own little repository?

dragonwriter4y ago

> Honest question - how do you actually effectively share code between many repos?

One way is: Each repo is a responsibility boundary and single source of truth, you use code from other repos the same as any other external dependency.

> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?

Changing an API breaks projects using it; you either do versioned APIs and/or coordinate changes with downstream consumers, the same as you would with an API with external customers.

(Another way is “downstream projects checkout their dependencies and build against them as a routine part of their process.“)

1 more reply

giaour4y ago

> Honest question - how do you actually effectively share code between many repos?

Locally, you can use an Amazon-internal tool to check out multiple repos and make changes to all of them. The tooling calls this a "workspace," but it feels very much like working in a monorepo since building and testing can happen at the workspace level.

> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?

In terms of change management, Amazon dependency graphs are managed as "version sets." Changes have to be built into a given version set, and that build will also rebuild any package in the version set that consumes the repository whose changes are being built in. (Usually, repositories are configured to build into one of the owning team's version sets on each commit to the primary branch.)

ignoramous4y ago

> It perplexes me people advocate for this strategy. All I can think is it's another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).

Not sure if it is a generic comment or a comment on TFA:

i) If the latter, I'm compelled to point out that TFA doesn't nearly advocate for monorepos as much as it lists reasons why a few SV companies use it, how they use it, and what they get out of it.

ii) If the former, then this blog post makes for a good read: https://tailscale.com/blog/modules-monoliths-and-microservic...

jvolkman4y ago

Let's not pretend that Amazon hasn't spent significant effort over decades building and maintaining their own (non-monorepo) build systems and tooling.

jkaptur4y ago· 8 in thread

> the downsides are already widely discussed.

Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.

cortesoft4y ago

There are a bunch of downsides, although they are often just the opposite problem from what the monorepo solves.

For example, the article states:

> [In the other direction,] Forcing dependees to update is actually another benefit of a monorepo.

What happens when the other teams that depend on your work don’t have the time/priority to update their code to work with the new version of your code? The ideal case that monorepo proponents tout is that the team updating the code that is depended on can just update the code for everyone who depends on them… however, that update is not always trivial and might require rework for code deeper inside the other teams projects. Maybe they are depending on functionality that is going away and it requires major work to change to the new functionality, and the team is working on other high priority things and can’t spend the time right now to update their code.

What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves? How does this work if there are dozens of teams that use the dependency? You cant have every team that creates core shared code be experts on every other team’s work. You can end up stuck waiting for every team to be ready to work on updating this dependency.

Imagine if this was how all dependencies in your code worked, and every build task used the latest release of every dependency regardless of major version bumps. You might wake up on a Tuesday and your build fails and now you have to spend a week updating your project to use the latest version. Multiply this by all the dependencies and your priority list is no longer your own, you are forced to spend your time fixing dependencies.

This is why we specify versions in our dependencies, so we can update on our own schedule.

Of course, the downside of this is now you have to support multiple versions of your code, which is the trade off and the problem a monorepo solves.

You are going to end up with downsides either way, the question is which is worse.

mvc4y ago

> What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves?

versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution".

Better to give the new incompatible behavior a new name. Deprecate the old name. Then callers of the old thing can fix on their own schedule.

1 more reply

rbetts4y ago

A monorepo assumes all your IP is either open or closed or you need a very reliable way to extract the OSS bits and publish them to a mirror without putting exposure of closed source IP at risk.

jeffbee4y ago

The main downside that people always mention is it takes a long time to clone or pull a large repo. This is actually a flaw of git, not a flaw of the monorepo as a concept.

dundarious4y ago

People can't use the concept, they must use an actual tool.

The problem with these frequent monorepo discussion threads is that monorepos are at a significant disadvantage when it comes to good existing and available tools (especially open source ones), but most of the boosters work at companies that mostly use good existing and unavailable tools.

I've no problem with the discussion of course, and largely agree with the conceptual superiority in many cases, but on the practical side, the downsides are still significant and IMO overpowering. I've worked at insanely profitable medium sized companies that would use a monorepo if the tools were there, but instead used svn+externals and then git+a very simple script implementing essentially the same thing as svn:externals. The latter is a great option, IMO especially if you flatten all dependencies to the project/top level (i.e., all transitive dependencies specified and versioned at the top level), as you don't have the A->B->C problem where A using an updated C requires work from team C, B, A; you can just do C, A. It also discourages deeply nested dependencies, and bounds dependency count somewhat, and provides a very explicit and conscientious view of your total dependencies. Updates are also easy to partially automate.

1 more reply

yboris4y ago

I think you can clone just the last commit:

> Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:

  git clone -–depth [depth] [remote-url]

2 more replies

8note4y ago

Is that a flaw with git? Or a flaw with trying to use git for monorepos, vs some other change management built for that kind of repo?

hkt4y ago

Principle of least privilege springs to mind but I'm not familiar with the other issues.

captainmuon4y ago· 7 in thread

One upside of smaller repos that I rarely hear about is that it forces you to think about versioning. If you have a monorepo, you often don't version individual components, you just have master that always builds. If your product is a user facing website, that is fine. But if you make releases, and have multiple components in different versions that have a stable API, and are expected to work in different combinations, then it is a real hassle. Of course you can tag individual library versions in a monorepo, but that is not the way of least resistance.

One place I've worked at migrated to a monorepo, the ATLAS experiment at CERN. It was not bad, although there were the usual problems with long checkout time. But it worked because we tended to version every single piece of software together in a big "release" anyway (to make scientific results reproducable).

bentcorner4y ago

This almost feels like a version of Conway's Law: you inevitably ship the org structure.

Your dev tooling also influences the shape of the thing that you write. If you have a monorepo then it encourages you to ship a monolith that freely interoperates with itself. If you have multiple repoes that need to be versioned against each other, you will ship components with more stable APIs.

So this means that if you ship a product within which customers are free to update portions of them at will, then using a monorepo will make things more difficult than necessary.

And if you ship a single unversioned monolith to the world, then using multiple repoes adds unnecessary friction to working within the company.

klodolph4y ago

Google, at one point, had component versioning that was not just "build everything from the latest commit". Libraries within the tree would get tagged releases, and everything else would build from the latest tag of those libraries.

This practice was abandoned, but I don't know the reasoning for why it was abandoned.

inoffensivename4y ago

People hated that they couldn't make atomic changes across components. Google's monorepo means everybody has to move in lock-step, which is bad for everybody:

* library maintainers must make sure they don't introduce any regressions to any users at all. There's no major version number that you can increment to let people know that something has changed. Development necessarily slows.

* Library users must deal with any breakage in any library they use. Breakage can happen at any time because everybody effectively builds from HEAD. There are complicated systems in place for tracking ranges of bad CL numbers

Monorepo isn't entirely to blame for this, but it certainly doesn't help. I've been at Google 15 years and I'm tired of this.

3 more replies

WorldMaker4y ago

Right, smaller repos add more friction to dependencies, that is certain, but flipside view of that is that it enforces API boundaries and thinking about systems building as SOLID components in their own right.

That friction sometimes helps: If it is painful to update Dependency A because it usually means upstreaming changes to A's Dependency B first, for instance, that can often indicate a tight-coupling problem that in a class diagram someone might easily discover and refactor over lunch but in a systems diagram was non-obvious without that "update hell" pain. Solving such tight-coupling problems is hard, and it may mean living with the pain for some time, and while monorepos make that pain go away they never solve those coupling problems (and arguably make it far easier to strongly couple systems that you likely don't want coupled). It's a lot like turning off all the Warnings in your compiler; it makes the immediate dev experience a lot nicer, but it risks missing things that while not problems now may be problems in the future.

I think there are also some benefits to using the same dependency managers for first-party components/libraries as for third-party components. The auto-updating of first-party versions is seen as a benefit to monorepos, but if recent and current CVEs have taught us anything you need to audit and update your third-party components quite regularly. Needing to also update first-party components/libraries with the same dependency managers has some benefits in terms of forcing a regular dependency update cadence, that then also benefits additional developer eyes on third party update rhythms. (Especially as increasingly more dependency managers pick up auto-auditing/security and CVE awareness tooling that runs on each update. There's more likely developer eyeballs on those audit reports if frequently run for first-party components and third-party components.) Dependency managers are their own friction in the process, but necessary friction for third-party components, and there are benefits to first-party components needing the same friction.

As with most software development practices there is no objectively "right" answer here. Monorepos have less friction in a large org. Friction and pain are sometimes useful tools, despite few people "want" them in their developer experience. Systems design is hard and tight-coupling is often an easy solution. Looser coupling is often better, more resilient design that is easier to work with at the boundaries and the "I can trust this other team's repo to be a black box and they let me file bug reports as if they were a second-party vendor" level, which can be its own tool for avoidance of mental fatigue.

klodolph4y ago

My personal experience in large orgs is that friction is a much larger problem at larger orgs than it is at smaller orgs. The friction was always much lower, day-to-day, at small orgs. (Small orgs front-load the friction somewhat... "Here, set up your development environment.")

1 more reply

idunno2464y ago

yea that friction is good. i wrote some code. someone liked it and added a dependency of their app on my app. i needed to update my code - all the sudden i was responsible for updating some other random app and ensuring it kept working - behavior we considered a bug and they didnt, so both code versions needed to work. the monorepo let them make an api where one was never intended to exist

cryptonector4y ago

Whether that's an upside depends. Mostly I think it's a downside.

liminal4y ago· 7 in thread

Any suggestions for how to go from multiple Git repos to a monorepo? Preserving history would be really nice. I've looked at submodules and subtrees and both seem to have huge downsides and don't deliver the same benefits of a true monorepo.

moojd4y ago

Yes! I once had to merge a dozen or so repos into a mono repo. I don't have my script handy but git allows merging repos with unrelated histories into one repo while preserving the history.

If I remember correctly, this is how you do it:

  1. Create a new empty repo for the monorepo
  2. For each repo, 'git mv' all of the contents into a new directory with the repo's name
  3. Add the repos to the mono repos as remotes
  4. Run 'git merge --allow-unrelated-histories' for each repo

You will now have a monorepo with preserved history with each old repo existing inside of a sub-directory in the new monorepo

urxvtcd4y ago

You just want to merge the repositories, there are plenty of guides online. You need to realize that git has pretty simple internals, so the procedure looks like this: 1. Inside one repository you add a reference to the other repository with `git remote add`. 2. When you do a fetch, git will just download all the objects from the other repository. 3. Then you check the files out and make commit. You can do it in a few ways, for example, you might wish to preserve tags from all the repositories, but put them in their own namespaces, so you don't get conflicts. I wrote an answer on SO explaining exactly this: https://stackoverflow.com/questions/1425892/how-do-you-merge...

Pathogen-David4y ago

I've not used it extensively, but Josh can probably help here.

https://github.com/josh-project/josh

It's designed for making multiple Git repos from a monorepo, but I think you should be able to make a skeleton repo that represents your desired final monorepo layout and push your individual repos to the Josh subviews of that repo to combine them all.

(A big advantage of this approach over the multiple unrelated histories is that you don't have the mass move commits since Josh will rewrite all history as if the files were always in that folder, so you don't have to worry about history of individual files getting broken.)

chrschilling4y ago

You don't even need to make a skeleton repo first. By passing `-o merge` as extra option on the push to a non existing view, the merging of unrelated histories will be done by the server. See: https://github.com/josh-project/josh/issues/596

rkangel4y ago

Git as a system has no objection to having more than one 'initial commit'. It will happily take that branching history and merge it together. With a bit of branch renaming you can add extra remotes to your repo so both 'masters' are present. Commit to both to make the resulting directory structure not overlap, and then just merge. You'll end up with full history of both.

I did a quick google and these instructions seem about right (without the delete step): https://gist.github.com/msrose/2feacb303035d11d2d05

rubyist5eva4y ago

With Git I’m pretty sure you can literally just add a new remote to a completely unrelated repo, fetch it and then “merge” any branch from the other two into the new mono repo (ie. git checkout master; git merge other-remote/master). If all your projects are otherwise in a top level directory inside the monorepo this should merge cleanly and then can just live beside eachother in the checkout.

WorldMaker4y ago

You just need the `--allow-unrelated-histories` flag (the first merge) which git requires as a small sanity check.

benreesman4y ago· 6 in thread

Dan is diplomatic to a fault. Splitting repos on boundaries that aren’t necessary because of access control, legal obligation, or infrastructure constraint is for people who have nothing better to do.

All the big shops have multiple repositories. They all broke each one out grudgingly and under some kind of pressure.

exitheone4y ago

That's factually untrue. Google, Microsoft, Facebook, Twitter, Airbnb all have a huge monorepo. Splitting it out is obviously not necessary with the right tools.

benreesman4y ago

As someone who spent a decade at FB I can assure you that we had as few monorepos as possible. No more, no less.

The danger with mouthing off on HN is that this place is thick as thieves with people who actually do or did what the bloggers whinge on about.

Though in this case, the blogger has worked at all three of MS, Google, and Twitter, so I wouldn’t be quick to disregard him either.

1 more reply

jeffbee4y ago

Google at least has separate repos for the linux kernel and android, and probably others (chrome/os/ium?). The hugeness of google3 is not in doubt, but the mono-ness may be.

williamsmj4y ago

No. Twitter absolutely has multiple repos. Some of them are very big, but they certainly have more than one. Why they have > 1 repo is a long story, but the post you're responding to is in the ballpark.

disgruntledphd24y ago

And FB have multiple, really, really large monorepos.

1 more reply

8note4y ago

Legal obligation and unnecessary are mutually exlusive

Thaxll4y ago· 4 in thread

How do you manage versions / tags with monorepo? If you need to tag something ( a lib ) everyone gets the same, the entire repo now has a tag v0.0.1 eventhough only your library changed.

pvarangot4y ago

I worked on a team that was doing embedded software inside a monorepo with Linux and cloud stuff, so we needed to version our stuff because we were not doing "continuous deployment" flashing the uCs all the time a change on master happened. We just had big "feature branches" and those got rebased and merged weekly. For the cloud stuff, there was just one version and that version was what other team was striving to continuously deploy into staging and then production.

It's not ideal but it was handy to have access to all the cloud and application code on the embedded side for stuff like interface definitions for communication protocols and stuff like that. On the same company I worked on another project where the definition files for the cloud interface where on a different repo we had to use a submodule and I preferred the monorepo.

As other commenter said, we used bazel and there was indeed a smallish team that gave build support. Ramping up new hires on the build system was one of the more painful processes, I had to give support myself to teammates that had like a six month tenure only that they hadn't been there from day one.

jefftk4y ago

I think the most common option is you don't. The organizations that Dan gives as examples mostly don't produce public facing libraries, and when they do it's a separate process that lives on GitHub.

Instead, everything is based around the idea that you check out the state of the world at some commit, do your build, whatever validation you need, and send it to production. You do this pretty often, ideally multiple times a week. Very occasionally you have an emergency where you want prod + cherrypick, and you generally build tooling that allows saying "build at this commit, but also with these later specific commits merged in".

WorldMaker4y ago

Technically in git you can namespace tags to your heart's content, it's a relatively free form naming structure, just like branch names. Basically the same rules even to the point that if you use slash separators (example: mylib/v0.0.1) some UIs will even give you a directory structure of tag lists. (On the flipside, some UIs get very confused, but that's not git's fault.)

What monorepos I've seen rarely bother with tags in practice, in part because they rarely individually version libraries, but at a technical level you can do it in git, if you need to.

jvolkman4y ago

At least at Google: you don't. There is just one current version of your library. There are exceptions, but for the vast majority of things there is no concept of versioning.

889135274y ago· 3 in thread

In my experience, the developer experience for juniors is too much. Yarn + Lerna is just too much of a learning curve. However, having one repo and on CICD pipeline is convenient. But we've decided to divest from them. Your situation may not match mine, and that's okay.

pcmaffey4y ago

Lerna is only needed if you're publishing multiple packages from the monorepo. If you're consuming your packages only within the monorepo for your various services, Yarn Workspaces is generally all that's needed.

dimgl4y ago

Just don't use Yarn + Lerna :) pnpm is amazing

889135274y ago

Doesn't work older Angular projects. My monorepo is a hodge podge of technologies.

bob10294y ago· 2 in thread

We've been doing this for a few years now. Biggest non-intentional thing that came out of it was that the entire team started speaking in terms of commit hashes.

Once a non-technical person learns that the entire state of a product/project/organization can be described by a hash, they will begin to abuse it for literally everything. And, I totally endorse this. Its incredible to watch unfold. An employee passively noting the current commit hash like its the time of day puts a bit of joy into my brain every time.

Everyone can speak this language. The semantics are ridiculously simple.

chubot4y ago

Hm can you give an example of that? Are they wondering if the features they care about are deployed?

The linear version numbers could have an advantage in that regard. If you want to know if CL 12345 is deployed, and you know the current deployment is running as of CL 12350, then it should be in there. Conversely if it's less than that number, it's definitely not in there.

git hashes also have good properties but I'm wondering how non-technical employees use them. Do they know how to dig through the git history?

a9h74j4y ago

> Everyone can speak this language. The semantics are ridiculously simple.

Similar to the engineering/BOM-oriented semantics of everything is a drawing with a matching part number?

lliamander4y ago· 2 in thread

All these advantages really come down to making it easier to manage tightly coupled systems. That's great that the monorepo approach used by large tech companies with whole departments devoted to developer tooling can make that work.

However, I think the "polyrepo" response to most of these advantages would be to focus on decoupling your systems instead.

Take for instance:

> With a monorepo, you just refactor the API and all of its callers in one commit. That's not always trivial, but it's much easier than it would be with lots of small repos. I've seen APIs with thousands of usages across hundreds of projects get refactored and with a monorepo setup it's so easy that it's no one even thinks twice.

Like, that's really cool you can do that. But why are doing that?! Why are you breaking your API contract and forcing all of your clients to change all at once?

Of course, proper decoupling also requires good engineering. A polyrepo environment can still get horribly tangled, but the natural response to all of these tangling problems in a polyrepo is to move in a direction of looser coupling.

jvolkman4y ago

What is proper decoupling? In a properly-tooled monorepo, project A can't take a dependency on project B unless B is a public library or explicitly gives access to A [1]. Authors have full control over coupling.

Sure, avoid changing the API contract. But when the time comes to change the API, you can 1) make the change backwards compatible and maintain both methods forever; 2) release a new major version and maintain both versions forever; or 3) just migrate the callers and immediately be free of all technical debt that would've been accrued in 1 and 2. This assumes internal clients whom you presumably can't break.

1: https://bazel.build/concepts/visibility

lliamander4y ago

Visibility is somewhat orthogonal. I mean yes, you can avoid coupling issues by preventing the dependency altogether, but really the interesting problems occur once you do have a dependency.

Once you have the dependency, the loosely coupled approach means that (when possible) you avoid making changes to your API contract that would break your clients. I see the appeal of approach #3 that you suggest, but here's the problem I see with that (and maybe you have an answer):

For any change that breaks the contract (outside of trivial things like renaming an identifier) you are necessarily either adding a requirement your clients may not be able to satisfy, or removing a capability they depend upon. Migrating your clients in that case is more than just a simple refactor; the client may need to re-architect so that it can adapt to the change in contract or even move to a new dependency altogether. If you're not the owner of that client, that means you are either interrupting the other team while they are forced to help you with the migration, or you are blocked waiting for them to have the time.

In general, I would say the best approach to making breaking changes to an API is to use a deprecation process. That allows clients to migrate at their own pace. You can of course do that in either a monorepo or a polyrepo approach, but my expectation would be that the monorepo doesn't really provide you with any advantages in that case.

hardwaregeek4y ago· 2 in thread

I agree that monorepos are great if you're using version control systems in their current state. But I can't help but wonder if it's a question of monorepos being good, or version control/tooling inhibiting other options. If you had a VC tool that could compose repositories with ease, that could understand multiple histories and allow for atomic commits across repos, perhaps monorepos wouldn't be the best? Or you could keep the monorepo, but allow a "lens" into a specific subsection.

Even with Dan's point about monorepos making tooling easier, if a VC tool had a good API, perhaps this point would be moot. Why is it hard to query files and repository dependencies? Should there be some way to model dependencies in your version control system? It'd be interesting to see someone tackle these problems in version control.

throwaway8943454y ago

> I agree that monorepos are great if you're using version control systems in their current state. But I can't help but wonder if it's a question of monorepos being good, or version control/tooling inhibiting other options.

Ultimately the problem is that we need tooling which is aware of dependencies, and the repo abstraction isn't. Whether that code lives in a single repo or in many repos is fairly irrelevant, but keeping the code in a single repo is usually a fair bit easier for many things (especially when you're working in a single language since the language's build tools are usually well-suited for this basic case) and you don't need to manually update dependency manifest files, test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change.

WorldMaker4y ago

If you are using a dependency manager the repo abstraction, with multiple repos, does start to align as a useful node abstraction at the dependency graph level. For instance, if you are working in JS/TS, every repo has a top-level package.json file that is very easily consumed by tooling to discover dependencies. Github has a dependency graph that's pretty comprehensive for public packages as dependended on by public repos. For instance, the repos that depend on Typescript: https://github.com/microsoft/TypeScript/network/dependents?p...

There's often less tooling available for private repository hosts and private package feeds, but dependency management from a per-repo standpoint is if not a solved problem in practice, an easily solvable problem. (Github has some tools for private repos if you pay for them. Other systems can borrow from the same playbooks.)

(Other languages have similar dependency manifest files, most of which are similarly slurpable by easily automated tooling given the need/chance. Dependency discovery doesn't have to be a problem in multi-repository environments.)

> test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change

Some of this is push versus pull questions. One developer needing to push a lot of changes is a lot of work for that one developer. Downstream "owners" needing to pull changes at a frequency is in some cases much tinier slices of work spread out/delegated over a larger team of people, many of whom may be closer to downstream projects to better fix and/or troubleshoot secondary impacts.

Monorepos make push easier, definitely. Sometimes pull is a better workflow. (Especially if you are using the same dependency tooling for third-party components. These days given CVEs and such you want a regular update cadence on third-party components anyway, using the same tools for first-party updates keeps more reason to keep that cadence regular. Lots of small changes over time rather than big upgrade processes all at once.)

trollied4y ago· 2 in thread

> With a monorepo, projects can be organized and grouped together in whatever way you find to be most logically consistent, and not just because your version control system forces you to organize things in a particular way. Using a single repo also reduces overhead from managing dependencies.

I don't actually understand this. You can do this with git submodules. It's just a directory structure. Can somebody please explain? If the problem is committing to multiple things at the same time for a point-in-time release, then the answer is tags. Rather than terabytes of git history for a gigantic organisation that has many unrelated projects.

A good example for you: Google releases the Google Ad Manager API externally periodically, with dated releases. How does having that in a huge monorepo make sense?

nickdandakis4y ago

Have you used git submodules before? I've only used them once and vowed to never use them again.

It's effectively just a pointer to a hash, and ends up being useless for versioning + a really nice footgun for tracking upstream updates.

The monorepo vs manyrepo tradeoff boils down to this:

Do you want more complicated build + deploy tooling or do you want more complicated dependency management?

If the former, pick monorepo. If the latter, pick manyrepo.

perrygeo4y ago

This is the best summary of the topic I've read. Having used all three (submodules, monorepo, manyrepo) the only thing I can say with any certainty is - don't use submodules. The mono/manyrepo decision is not as clear cut but your description nails it.

Edit: submodules IS a viable solution for truly third-party repos over which you have no control and don't expect to ever edit.

1 more reply

denimnerd424y ago· 2 in thread

git seems like the wrong tool for monorepos so what is used instead if you can't immediately just build your own tools

tom_4y ago

Perforce is ok. It has roughly the right model, it scales pretty well, and the tools are... good enough.

WorldMaker4y ago

Microsoft and Twitter, at least, have very publicly invested a lot of engineering work in making git a tool for monorepos.

akshayshah4y ago· 1 in thread

I hope that Amazon open-sources Brazil and the surrounding version set ecosystem someday. They're the only large company I know of that uses individual project repos at scale, and they've built tools that solve many of these problems. (I've never worked there, so I don't know how loved those tools are internally.)

Edit: I worked at Microsoft, which also uses tons of tiny repos (at least within Azure). I didn't encounter any good cross-repo management tools, though; apart from having a Jira-like ticketing system built in, Azure DevOps seemed quite a bit worse than GitHub.

qznc4y ago

At least, I would like to read more about it. A while ago, I collected some information [0] but I don't know it first-hand. One Amazon developer told me that Nix is the closest thing in the Open Source world.

[0] http://beza1e1.tuxen.de/amazon_manyrepo_builds.html

codenesium4y ago· 1 in thread

Having been down the route of repos for every service I would always choose monorepo in the future. I could see separate repos for libraries. There is just too much overhead trying to manage multiple repos. With a single repo it's possible to build a package that represents all of your software vs being forced to version everything. Tasks almost always touch multiple services unless you are so big you have a team per service.

Yeroc4y ago

I don't agree with having libraries in their own repositories. For us one of the biggest costs to fixing bugs in libraries particularly was having to then update all dependent projects with the new version. By moving our libraries into a monorepo alongside all the consuming projects we got away from that busywork. It really streamlined things for us.

honkycat4y ago· 1 in thread

The thing about monorepos is similar to the the thing about micro-services: they require a lot of tooling and discipline and documentation that most organizations do not have.

On our multi-repos I have consistently seen dozens, if not hundreds, of stale pull requests and branches and issues piling up never to be merged. This compounds with a monorepo.

Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.

For me it is similar to the "one large file" argument, and why I don't agree: obfuscation is bad, but information hiding is GOOD. When I open a file, I want the information relevant to the current domain I am working in, not all of the information all at once.

Similarly, when I open a github page, I want its issues, pull requests, branches, and wiki to represent the state of a single project. The one I am currently interested in. You lose this with a monorepo.

You can argue "well tooling can..." yes tooling that does not exist and that I do not want to implement. Similar to the "one large file" argument, editors are set up to manage many different files with tabs. You COULD just compile the code and navigate symbols, but that isn't the world we currently live in.

Orphis4y ago

> Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.

It's simple, with proper tooling, you know exactly the dependencies, so you know which test depend on the affected files and can run those tests, the rest shouldn't be impacted. And that tooling exists. It's not the one you may be using, but it exists, and not just in FAANG.

paulvnickerson4y ago· 1 in thread

How do you address the blast radius problem with monorepos? For instance, I want to have a single gitlab repo for postgresql clusters. Using jsonnet, I deploy and configure a cluster for each customer, and adding a new cluster is as easy as adding a config file.

However, my colleague explained that it's a bad idea because any config changes or accidental button presses on gitlab's ci/cd page can bring down or wipe out everybody's cluster. How can that problem be mitigated? It seems intrinsic to monorepo style.

pbalau4y ago

Not sure why you got downvoted.

The problem is with your deploy system. You can consider each of the clusters to be a service. Thus, a change in Service A (cluster A), should not trigger a deployment of Service B (cluster B).

My pipeline is split in 2:

1. on bitbucket, we run a pipeline that builds "build artefacts", docker images and "packaged" cloudformation templates.

Each of these artefacts has a list of triggers, either base docker images or source code. I'm building the relevant docker image or cf package based on the triggers (it's quite a naïve glob() use).

2. On aws side, I have something I call AWS Apps, in short a Stack Name, along with a set of triggers (the above build artefacts). On merge to main, I only deploy the AWS Apps affected by new build artefacts.

pbiggar4y ago

Monorepos are also great for small monorepos with just a few projects. The darklang monorepo [1] has a devcontainer that installs all the build tools for 4 projects which create 21 different services, using 6 languages, and building everything is one step.

In fact, it makes it so easy to add new stuff that I didn't even realize we had 21 services til I counted. My first guess was 12.

[1] https://github.com/darklang/dark

jsnell4y ago

(2015)-ish. Significant previous discussions:

https://news.ycombinator.com/item?id=9562923

https://news.ycombinator.com/item?id=16362345

NaacOP4y ago

I think its worth calling out that there are different types of monorepos.

For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).

no_wizard4y ago

I'm a big fan of monorepos. If they get too unwieldy or you need VCS granular permissions, you should use Perforce over git, but using either git or Perforce generally speaking I think works fine for monorepos. The tool has come such a long way from even 10 years ago, especially for front end codebases, but even for things like Rust the story is really strong.

It comes down to how efficient you can be with tooling. Thats the one thing that monorepos really do require, is a good upfront investment in tooling, and long term maintenance. However I've found the initial setup "cost" of setting up a complex monorepo with correct tooling is far outweighed by the simplified operative overhead of working inside it.

atx424y ago

Our team is unique at our company, having a "monorepo" with 9 components versus the standard 1 component / 1 repo that other teams use. With maven, we can use one command to build any one or all components. If we split, we'd tell Jenkins how to build everything, but would say goodbye to simple local builds. Without introducing some more technology or complexity and likely specifying how the build works in two different places, I didn't see a good solution to this.

I mention this here, as maybe I'm missing some obvious solution.

pbalau4y ago

For multi repo I will need to build automation to manage all the repos and enforce a consistent experience across them, including syncing the repos, if we end up using stuff like submodules. And I need to do this now. We tried to "trust" every repo owner to do the right thing, but it was a cluster fuck.

With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.

trasz4y ago

Monorepo is one of the features I really like in FreeBSD. It makes adding functionality that goes across layers - eg adding a syscall implementation, its manual page, libc stub, and making use of it in some userspace component - trivial, compared to the hurdles necessary in the Linux world, where you'd need to interact with kernel folks, libc folks, some random userspace project folks, and then wait until it goes into distributions.

wjmao884y ago

Its Conway's Law, Your code organization is a reflection of your engineering teams organization.

The number of repos you have should roughly be equal to how many autonomous engineering "groups" you can divide into that work largely independent of other groups. Anything a group touches should probably be in the same repo as everything else that the same group touches.

Maksadbek4y ago

We use git with monorepos. The codebase is so large that git status command takes about 3-6 secs. Do you also use git with monorepos ?

switch334y ago

Large repos make sense or don't make sense based on companies that work with large data or not based on predicate calculus and derivatives usually dealing with repos as well as stories and have more problems with ssds too.

There is lots of problems associated with ssds as well as large monorepos. There are more complicated than people realize but if you did google code jam it teaches them somewhat but needs to be explained too. There problem is stories sort of intersect with programming too. Clockwork with ssds needs to be reworked for google code jams. The problem is elixir sort of works with stories and programming. Predicate calculus and proof theories sort of are the only way programming will really make sense in a world full of ssds. Leveldb could be a more interesting problem for google code jams if it has some newer features too. Conflict resolution is tower of hanoi and that has problems with consensus algorithms and concat too.SSDs need to do derivatives for pieceing and parting software too and that is more interesting too.

MichaelMoser1234y ago

one problem with multiple repos: you may end up with multiple binary components, like shared libraries, static libraries, etc, where each binary is produced from the sources of a separate repository. Now it may turn out to be a bit tricky to track a given binary found in a deployment to its sources. (on the JVM you could partially get by without the sources, as you have good decompilers)

I have never worked with mono repos, but I guess that this task would be somewhat easier, given that all sources are under a single repository.

dqpb4y ago

Use a monorepo, but organize your code as if it will someday be split into many repos.

exfascist4y ago

I'd argue that the optimal configuration is really a compromise; use sub modules with a dvcs tool like git. You get the organizational benefits of monorepos with the isolation benefits of individual repos. Your branches go stale in weeks rather than days, cloning even with full history can be very fast, and you don't need to learn new tools when you change organizations.

j / k navigate · click thread line to collapse

140 comments

106 comments · 30 top-level

lisper4y ago· 16 in thread

lhorie4y ago

dylan-m4y ago

> Having a good incremental build/test system is far more important to developer experience, IMHO.

[1] People will laugh at me for saying that from an operating known for having more window managers than there are text editors, but really, have you seen some proprietary software projects?

throwaway8943454y ago

sayrer4y ago

This isn't really true anymore, in my experience. I've used Bazel with teams of 30-50 and no full-time maintainer, let alone a team.

klodolph4y ago

I have personally run converted build systems to Bazel, and use it for personal projects as well.

https://github.com/bazelbuild/rules_python

1 more reply

AtlasBarfed4y ago

Is there, say, IntelliJ support for Bazel? Do you need a central server?

I've heard bazel is a bear...

Oh and of course the gradle code build in github has its own pipelines and general Turing machine to do whatever you want.

3 more replies

lupire4y ago

> incrementally building and testing libraries.

Like Make?

1 more reply

wilgertvelinga4y ago

Did you hear about nx.dev?

anon23anon4y ago

cryptonector4y ago

It's more than that. When you have to make changes that touch a lot of dependencies, it's much easier if all those dependencies

  - are in the same repo (making it easy to
    find and change all of them)

  - are in the same universe of build/test/deploy
    services (making integration of your changes
    atomic)

Conversely, if you want to be able to make breaking interface changes, the integration and deployment of those has to be atomic.

For me, the atomicity feature is the killer feature of monorepos.

lisper4y ago

2 more replies

8note4y ago

Deploying atomic changes is much harder than writing them. having a host be updated atomically doesn't mean everything it communicates with has gotten the same change

1 more reply

parentheses4y ago

This is massively oversimplified.

- the cost of having access to more than you need: cognitive load and tooling for filtering, larger repositories require more tooling work to be performant

- there's also the atomicity of change and past changes which one can see/understand

lisper4y ago

How is that different from what I said?

mikepurvis4y ago

jeffbee4y ago

http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...

oceanplexian4y ago· 11 in thread

I know some of the FAANGs do monorepo (Google being the biggest) but AWS does not.

It perplexes me people advocate for this strategy. All I can think is it’s another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).

throwaway8943454y ago

Not having to submit and coordinate PRs across a dozen repos is a pretty tangible benefit.

Because the tooling sucks, as previously mentioned. Many changes require touching many repos, which means coordinating many pull requests and manually changing dependency manifest files and so on.

oceanplexian4y ago

> when an upstream library changes, update downstream libraries' dependency manifest files

As someone who's more systems oriented, ideally projects are locked in to a specific versioned dependency, and nothing changes unless a developer of a project explicitly asks for it.

1 more reply

erik_seaberg4y ago

> when an upstream library changes, update downstream libraries' dependency manifest files

This needs to happen periodically, when we have slack. Doing it continuously adds risks that aren’t really our job to take.

1 more reply

giaour4y ago

Since leaving Amazon, I've mostly worked with monorepos and wouldn't go back to multirepos without Brazil-style tooling.

ghoward4y ago

Hey, I'm hoping you can answer some questions for me.

I'm building a build system and a VCS (separately). I want to do it right.

Could you explain to me what Brazil is? Is it the build system? [1] Or is it the VCS that Amazon uses?

Also, what are workspaces? Does this quote from one of the comments on the link match?

> A workspace consisted of a version set to track and any packages that were checked out.

[1]: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6...

2 more replies

dastbe4y ago

radicality4y ago

For context, I was for a very long time at FB so am definitely used to the monorepo way, and recently switched to place which uses github + many repos, and it feels so much worse.

dragonwriter4y ago

> Honest question - how do you actually effectively share code between many repos?

One way is: Each repo is a responsibility boundary and single source of truth, you use code from other repos the same as any other external dependency.

> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?

Changing an API breaks projects using it; you either do versioned APIs and/or coordinate changes with downstream consumers, the same as you would with an API with external customers.

(Another way is “downstream projects checkout their dependencies and build against them as a routine part of their process.“)

1 more reply

giaour4y ago

> Honest question - how do you actually effectively share code between many repos?

> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?

ignoramous4y ago

> It perplexes me people advocate for this strategy. All I can think is it's another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).

Not sure if it is a generic comment or a comment on TFA:

i) If the latter, I'm compelled to point out that TFA doesn't nearly advocate for monorepos as much as it lists reasons why a few SV companies use it, how they use it, and what they get out of it.

ii) If the former, then this blog post makes for a good read: https://tailscale.com/blog/modules-monoliths-and-microservic...

jvolkman4y ago

Let's not pretend that Amazon hasn't spent significant effort over decades building and maintaining their own (non-monorepo) build systems and tooling.

jkaptur4y ago· 8 in thread

> the downsides are already widely discussed.

Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.

cortesoft4y ago

There are a bunch of downsides, although they are often just the opposite problem from what the monorepo solves.

For example, the article states:

> [In the other direction,] Forcing dependees to update is actually another benefit of a monorepo.

This is why we specify versions in our dependencies, so we can update on our own schedule.

Of course, the downside of this is now you have to support multiple versions of your code, which is the trade off and the problem a monorepo solves.

You are going to end up with downsides either way, the question is which is worse.

mvc4y ago

Better to give the new incompatible behavior a new name. Deprecate the old name. Then callers of the old thing can fix on their own schedule.

1 more reply

rbetts4y ago

A monorepo assumes all your IP is either open or closed or you need a very reliable way to extract the OSS bits and publish them to a mirror without putting exposure of closed source IP at risk.

jeffbee4y ago

The main downside that people always mention is it takes a long time to clone or pull a large repo. This is actually a flaw of git, not a flaw of the monorepo as a concept.

dundarious4y ago

People can't use the concept, they must use an actual tool.

1 more reply

yboris4y ago

I think you can clone just the last commit:

> Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:

  git clone -–depth [depth] [remote-url]

2 more replies

8note4y ago

Is that a flaw with git? Or a flaw with trying to use git for monorepos, vs some other change management built for that kind of repo?

hkt4y ago

Principle of least privilege springs to mind but I'm not familiar with the other issues.

captainmuon4y ago· 7 in thread

bentcorner4y ago

This almost feels like a version of Conway's Law: you inevitably ship the org structure.

So this means that if you ship a product within which customers are free to update portions of them at will, then using a monorepo will make things more difficult than necessary.

And if you ship a single unversioned monolith to the world, then using multiple repoes adds unnecessary friction to working within the company.

klodolph4y ago

This practice was abandoned, but I don't know the reasoning for why it was abandoned.

inoffensivename4y ago

People hated that they couldn't make atomic changes across components. Google's monorepo means everybody has to move in lock-step, which is bad for everybody:

Monorepo isn't entirely to blame for this, but it certainly doesn't help. I've been at Google 15 years and I'm tired of this.

3 more replies

WorldMaker4y ago

klodolph4y ago

1 more reply

idunno2464y ago

cryptonector4y ago

Whether that's an upside depends. Mostly I think it's a downside.

liminal4y ago· 7 in thread

moojd4y ago

Yes! I once had to merge a dozen or so repos into a mono repo. I don't have my script handy but git allows merging repos with unrelated histories into one repo while preserving the history.

If I remember correctly, this is how you do it:

  1. Create a new empty repo for the monorepo
  2. For each repo, 'git mv' all of the contents into a new directory with the repo's name
  3. Add the repos to the mono repos as remotes
  4. Run 'git merge --allow-unrelated-histories' for each repo

You will now have a monorepo with preserved history with each old repo existing inside of a sub-directory in the new monorepo

urxvtcd4y ago

Pathogen-David4y ago

I've not used it extensively, but Josh can probably help here.

https://github.com/josh-project/josh

chrschilling4y ago

rkangel4y ago

I did a quick google and these instructions seem about right (without the delete step): https://gist.github.com/msrose/2feacb303035d11d2d05

rubyist5eva4y ago

WorldMaker4y ago

You just need the `--allow-unrelated-histories` flag (the first merge) which git requires as a small sanity check.

benreesman4y ago· 6 in thread

All the big shops have multiple repositories. They all broke each one out grudgingly and under some kind of pressure.

exitheone4y ago

That's factually untrue. Google, Microsoft, Facebook, Twitter, Airbnb all have a huge monorepo. Splitting it out is obviously not necessary with the right tools.

benreesman4y ago

As someone who spent a decade at FB I can assure you that we had as few monorepos as possible. No more, no less.

The danger with mouthing off on HN is that this place is thick as thieves with people who actually do or did what the bloggers whinge on about.

Though in this case, the blogger has worked at all three of MS, Google, and Twitter, so I wouldn’t be quick to disregard him either.

1 more reply

jeffbee4y ago

Google at least has separate repos for the linux kernel and android, and probably others (chrome/os/ium?). The hugeness of google3 is not in doubt, but the mono-ness may be.

williamsmj4y ago

disgruntledphd24y ago

And FB have multiple, really, really large monorepos.

1 more reply

8note4y ago

Legal obligation and unnecessary are mutually exlusive

Thaxll4y ago· 4 in thread

How do you manage versions / tags with monorepo? If you need to tag something ( a lib ) everyone gets the same, the entire repo now has a tag v0.0.1 eventhough only your library changed.

pvarangot4y ago

jefftk4y ago

I think the most common option is you don't. The organizations that Dan gives as examples mostly don't produce public facing libraries, and when they do it's a separate process that lives on GitHub.

WorldMaker4y ago

What monorepos I've seen rarely bother with tags in practice, in part because they rarely individually version libraries, but at a technical level you can do it in git, if you need to.

jvolkman4y ago

At least at Google: you don't. There is just one current version of your library. There are exceptions, but for the vast majority of things there is no concept of versioning.

889135274y ago· 3 in thread

pcmaffey4y ago

dimgl4y ago

Just don't use Yarn + Lerna :) pnpm is amazing

889135274y ago

Doesn't work older Angular projects. My monorepo is a hodge podge of technologies.

bob10294y ago· 2 in thread

We've been doing this for a few years now. Biggest non-intentional thing that came out of it was that the entire team started speaking in terms of commit hashes.

Everyone can speak this language. The semantics are ridiculously simple.

chubot4y ago

Hm can you give an example of that? Are they wondering if the features they care about are deployed?

git hashes also have good properties but I'm wondering how non-technical employees use them. Do they know how to dig through the git history?

a9h74j4y ago

> Everyone can speak this language. The semantics are ridiculously simple.

Similar to the engineering/BOM-oriented semantics of everything is a drawing with a matching part number?

lliamander4y ago· 2 in thread

However, I think the "polyrepo" response to most of these advantages would be to focus on decoupling your systems instead.

Take for instance:

Like, that's really cool you can do that. But why are doing that?! Why are you breaking your API contract and forcing all of your clients to change all at once?

jvolkman4y ago

1: https://bazel.build/concepts/visibility

lliamander4y ago

Visibility is somewhat orthogonal. I mean yes, you can avoid coupling issues by preventing the dependency altogether, but really the interesting problems occur once you do have a dependency.

hardwaregeek4y ago· 2 in thread

throwaway8943454y ago

WorldMaker4y ago

> test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change

trollied4y ago· 2 in thread

A good example for you: Google releases the Google Ad Manager API externally periodically, with dated releases. How does having that in a huge monorepo make sense?

nickdandakis4y ago

Have you used git submodules before? I've only used them once and vowed to never use them again.

It's effectively just a pointer to a hash, and ends up being useless for versioning + a really nice footgun for tracking upstream updates.

The monorepo vs manyrepo tradeoff boils down to this:

Do you want more complicated build + deploy tooling or do you want more complicated dependency management?

If the former, pick monorepo. If the latter, pick manyrepo.

perrygeo4y ago

Edit: submodules IS a viable solution for truly third-party repos over which you have no control and don't expect to ever edit.

1 more reply

denimnerd424y ago· 2 in thread

git seems like the wrong tool for monorepos so what is used instead if you can't immediately just build your own tools

tom_4y ago

Perforce is ok. It has roughly the right model, it scales pretty well, and the tools are... good enough.

WorldMaker4y ago

Microsoft and Twitter, at least, have very publicly invested a lot of engineering work in making git a tool for monorepos.

akshayshah4y ago· 1 in thread

qznc4y ago

[0] http://beza1e1.tuxen.de/amazon_manyrepo_builds.html

codenesium4y ago· 1 in thread

Yeroc4y ago

honkycat4y ago· 1 in thread

The thing about monorepos is similar to the the thing about micro-services: they require a lot of tooling and discipline and documentation that most organizations do not have.

On our multi-repos I have consistently seen dozens, if not hundreds, of stale pull requests and branches and issues piling up never to be merged. This compounds with a monorepo.

Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.

Orphis4y ago

> Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.

paulvnickerson4y ago· 1 in thread

pbalau4y ago

Not sure why you got downvoted.

The problem is with your deploy system. You can consider each of the clusters to be a service. Thus, a change in Service A (cluster A), should not trigger a deployment of Service B (cluster B).

My pipeline is split in 2:

1. on bitbucket, we run a pipeline that builds "build artefacts", docker images and "packaged" cloudformation templates.

Each of these artefacts has a list of triggers, either base docker images or source code. I'm building the relevant docker image or cf package based on the triggers (it's quite a naïve glob() use).

pbiggar4y ago

In fact, it makes it so easy to add new stuff that I didn't even realize we had 21 services til I counted. My first guess was 12.

[1] https://github.com/darklang/dark

jsnell4y ago

(2015)-ish. Significant previous discussions:

https://news.ycombinator.com/item?id=9562923

https://news.ycombinator.com/item?id=16362345

NaacOP4y ago

I think its worth calling out that there are different types of monorepos.

no_wizard4y ago

atx424y ago

I mention this here, as maybe I'm missing some obvious solution.

pbalau4y ago

With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.

trasz4y ago

wjmao884y ago

Its Conway's Law, Your code organization is a reflection of your engineering teams organization.

Maksadbek4y ago

We use git with monorepos. The codebase is so large that git status command takes about 3-6 secs. Do you also use git with monorepos ?

switch334y ago

MichaelMoser1234y ago

I have never worked with mono repos, but I guess that this task would be somewhat easier, given that all sources are under a single repository.

dqpb4y ago

Use a monorepo, but organize your code as if it will someday be split into many repos.

exfascist4y ago

j / k navigate · click thread line to collapse