The biggest dissonance when it comes to the purported benefits of monorepos is that a "good" monorepo generally assumes very good interface design skills across all teams, but in reality, the path of least resistance is tacking on more and more unique codepaths (e.g. forking/"rewriting" existing things), so in effect, likability often comes down to how well a team is able to isolate itself from global changes (by choosing stable/boring APIs, inventing their own abstractions in their own little corner, or what have you).
This is very true. Used poorly, monorepos are a crutch which allow a team to pretend that stable interfaces, versioning, and boundaries don't matter. Sure, your team can (theoretically) build the universe from a single git clone. Now what happens when another team needs to deal with your mess? What happens when you add some external dependency and now you have to deal with all of those problems anyway?
[You also shouldn't use git submodules to solve this, because that's basically the same thing but with the added annoyance of git. You should publish your bloody packages. With version numbers. And changelogs. Real version numbers. Real changelogs. Written by humans].
The author mentions the complexity barrier in open source, and I think that's a really interesting observation, but at the same time I think that complexity is the reason free software is alive today. It is definitely overwhelming for newcomers when a project requires a whole bunch of specific pieces that are all from different places. But once you've gotten past that, collaboration between a diverse range of people and organizations becomes an obvious and practical thing instead of a major undertaking. People don't all go off and write their own things from scratch[1], or clone code from place to place because it's too annoying to reuse it properly. Something internal feels similar to something external, which reinforces collective ownership.
Consider that Chromium includes its own everything, takes half a day to build from source, and is decidedly not a community-run project. Debian, meanwhile, is the polar opposite of a monorepo and continues to be alive and well without the oppressive shadow of a single 600 ton schizophrenic gorilla.
I think a lot of the time a team just wants a monorepo because they want a one-stop shop for fetching and building all of the things because internal dependencies are difficult. If that is the case, I think it's always worth considering something like BuildStream. It lets you specify where things are and how to build them, and it provides some useful tools on top of that. It doesn't solve brute-forcing a change across multitudes of applications, but it lowers the barrier to entry, it forces developers to care about deployment once in a while, and it can certainly help you to spot the integration issues when you change an interface without telling anyone.
[1] People will laugh at me for saying that from an operating known for having more window managers than there are text editors, but really, have you seen some proprietary software projects?
Once the migration is done, all you need is a few people that do some Bazel gardening every few weeks, and it's certainly not a full time job. This can be someone that does operations (CI, deployments, etc) or a product/infrastructure engineer, or one of each. Github / Gitlab scale to all but the largest projects, and even then, you can just split into two or three "monorepos" and kick the can down the road. With things like BuildBuddy, it's even easier.
As the article states, there are a lot of little of hidden costs and paper cuts when using a many-repo layout. The one that I've seen that's most prevalent is that it obscures copy/paste behavior, since it's much more difficult to detect in a many-repo setup.
Going to Bazel or equivalent is a bit of a mind adjustment, and some languages are better supported than others, but it really starts to pay off in larger projects. Especially if there's more than a few languages in use.
Bazel 1.0 was released in October 2019. If you were using it "a few years ago", I'm guessing you were using a pre-1.0 version. There's not some cutoff where Bazel magically got easy to use, and I still wouldn't describe it as "easy", but the problem it solves is hard to solve well, and the community support for Bazel has gotten a lot better over the past years.
https://github.com/bazelbuild/rules_python
The difficulty and complexity of using Bazel is highly variable. I've seen some projects where using Bazel is just super simple and easy, and some projects where using Bazel required a massive effort (custom toolchains and the like).
I've heard bazel is a bear...
But... all mature build systems are, because they become essentially enterprise workflow engines, process execution engines, internal systems integration hubs, and schedulers. Why? Because that's what an enterprise/mature build system is, it only differs from other software packages with the same capabilities in that it concentrates on build / deploy / CI "business tasks".
My current employer uses Jenkins (which has workflows/pipelines, daemons) and then feeds into Spinnaker (which has a full DAG workflow engine and interface) and likely this is pretty close to a "standard" or "best of breed" cloud build CI system. Of course there is a dedicated team.
Oh and of course the gradle code build in github has its own pipelines and general Turing machine to do whatever you want.
- are in the same repo (making it easy to
find and change all of them)
- are in the same universe of build/test/deploy
services (making integration of your changes
atomic)
Atomicity of integration is essential, especially in organizations that move fast and make lots of breaking interface changes. Where it's to make a breaking interface change, it will be OK IFF you can make that change atomic.Conversely, if you want to be able to make breaking interface changes, the integration and deployment of those has to be atomic.
Not having a monorepo & monobuild means that you have to have stringent interface backwards-compatibility commitments. That's fine if you're shipping an operating system, say, but it's usually too painful if you're not shipping anything to third parties.
For me, the atomicity feature is the killer feature of monorepos.
- the cost of having access to more than you need: cognitive load and tooling for filtering, larger repositories require more tooling work to be performant
- there's also the atomicity of change and past changes which one can see/understand
Anyway, it seemed like such an obvious complement to the the perforce/monorepo style of working that I came away surprised that perforce wouldn't have hoisted such a thing into their product as a first-class feature. Tracing dependencies across a lot of different build systems is obviously not trivial, but it's not intractable, particularly if the tool is pluggable so that orgs can provide modules to handle their own particular approach.
http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...
One place I've worked at migrated to a monorepo, the ATLAS experiment at CERN. It was not bad, although there were the usual problems with long checkout time. But it worked because we tended to version every single piece of software together in a big "release" anyway (to make scientific results reproducable).
Your dev tooling also influences the shape of the thing that you write. If you have a monorepo then it encourages you to ship a monolith that freely interoperates with itself. If you have multiple repoes that need to be versioned against each other, you will ship components with more stable APIs.
So this means that if you ship a product within which customers are free to update portions of them at will, then using a monorepo will make things more difficult than necessary.
And if you ship a single unversioned monolith to the world, then using multiple repoes adds unnecessary friction to working within the company.
This practice was abandoned, but I don't know the reasoning for why it was abandoned.
* library maintainers must make sure they don't introduce any regressions to any users at all. There's no major version number that you can increment to let people know that something has changed. Development necessarily slows.
* Library users must deal with any breakage in any library they use. Breakage can happen at any time because everybody effectively builds from HEAD. There are complicated systems in place for tracking ranges of bad CL numbers
Monorepo isn't entirely to blame for this, but it certainly doesn't help. I've been at Google 15 years and I'm tired of this.
That friction sometimes helps: If it is painful to update Dependency A because it usually means upstreaming changes to A's Dependency B first, for instance, that can often indicate a tight-coupling problem that in a class diagram someone might easily discover and refactor over lunch but in a systems diagram was non-obvious without that "update hell" pain. Solving such tight-coupling problems is hard, and it may mean living with the pain for some time, and while monorepos make that pain go away they never solve those coupling problems (and arguably make it far easier to strongly couple systems that you likely don't want coupled). It's a lot like turning off all the Warnings in your compiler; it makes the immediate dev experience a lot nicer, but it risks missing things that while not problems now may be problems in the future.
I think there are also some benefits to using the same dependency managers for first-party components/libraries as for third-party components. The auto-updating of first-party versions is seen as a benefit to monorepos, but if recent and current CVEs have taught us anything you need to audit and update your third-party components quite regularly. Needing to also update first-party components/libraries with the same dependency managers has some benefits in terms of forcing a regular dependency update cadence, that then also benefits additional developer eyes on third party update rhythms. (Especially as increasingly more dependency managers pick up auto-auditing/security and CVE awareness tooling that runs on each update. There's more likely developer eyeballs on those audit reports if frequently run for first-party components and third-party components.) Dependency managers are their own friction in the process, but necessary friction for third-party components, and there are benefits to first-party components needing the same friction.
As with most software development practices there is no objectively "right" answer here. Monorepos have less friction in a large org. Friction and pain are sometimes useful tools, despite few people "want" them in their developer experience. Systems design is hard and tight-coupling is often an easy solution. Looser coupling is often better, more resilient design that is easier to work with at the boundaries and the "I can trust this other team's repo to be a black box and they let me file bug reports as if they were a second-party vendor" level, which can be its own tool for avoidance of mental fatigue.
Once a non-technical person learns that the entire state of a product/project/organization can be described by a hash, they will begin to abuse it for literally everything. And, I totally endorse this. Its incredible to watch unfold. An employee passively noting the current commit hash like its the time of day puts a bit of joy into my brain every time.
Everyone can speak this language. The semantics are ridiculously simple.
The linear version numbers could have an advantage in that regard. If you want to know if CL 12345 is deployed, and you know the current deployment is running as of CL 12350, then it should be in there. Conversely if it's less than that number, it's definitely not in there.
git hashes also have good properties but I'm wondering how non-technical employees use them. Do they know how to dig through the git history?
Similar to the engineering/BOM-oriented semantics of everything is a drawing with a matching part number?
Edit: I worked at Microsoft, which also uses tons of tiny repos (at least within Azure). I didn't encounter any good cross-repo management tools, though; apart from having a Jira-like ticketing system built in, Azure DevOps seemed quite a bit worse than GitHub.
However, I think the "polyrepo" response to most of these advantages would be to focus on decoupling your systems instead.
Take for instance:
> With a monorepo, you just refactor the API and all of its callers in one commit. That's not always trivial, but it's much easier than it would be with lots of small repos. I've seen APIs with thousands of usages across hundreds of projects get refactored and with a monorepo setup it's so easy that it's no one even thinks twice.
Like, that's really cool you can do that. But why are doing that?! Why are you breaking your API contract and forcing all of your clients to change all at once?
Of course, proper decoupling also requires good engineering. A polyrepo environment can still get horribly tangled, but the natural response to all of these tangling problems in a polyrepo is to move in a direction of looser coupling.
Sure, avoid changing the API contract. But when the time comes to change the API, you can 1) make the change backwards compatible and maintain both methods forever; 2) release a new major version and maintain both versions forever; or 3) just migrate the callers and immediately be free of all technical debt that would've been accrued in 1 and 2. This assumes internal clients whom you presumably can't break.
Once you have the dependency, the loosely coupled approach means that (when possible) you avoid making changes to your API contract that would break your clients. I see the appeal of approach #3 that you suggest, but here's the problem I see with that (and maybe you have an answer):
For any change that breaks the contract (outside of trivial things like renaming an identifier) you are necessarily either adding a requirement your clients may not be able to satisfy, or removing a capability they depend upon. Migrating your clients in that case is more than just a simple refactor; the client may need to re-architect so that it can adapt to the change in contract or even move to a new dependency altogether. If you're not the owner of that client, that means you are either interrupting the other team while they are forced to help you with the migration, or you are blocked waiting for them to have the time.
In general, I would say the best approach to making breaking changes to an API is to use a deprecation process. That allows clients to migrate at their own pace. You can of course do that in either a monorepo or a polyrepo approach, but my expectation would be that the monorepo doesn't really provide you with any advantages in that case.
A monorepo is an organizational mess when trying to manage and transfer ownership across thousands of teams, contain the blast radius of changes, unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories. And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.
It perplexes me people advocate for this strategy. All I can think is it’s another one of those cargo-cult ideas that everyone is doing because Google did it (So it must be good).
> unless you invest a ton of resources into proprietary tooling that requires a bunch of maintenance, since all the open source solutions are terrible at this and the whole data model is built around splitting out individual project repositories.
Agree that there's a bunch of tooling needed to operate a monorepo, but there's also a bunch of tooling to sanely manage dozens of "microrepos" as well (when an upstream library changes, update downstream libraries' dependency manifest files to take the new version, run tests, report errors back to upstream, etc). I don't know of any open source tools that manage this problem, but I'm guessing they aren't high-quality if only due to the complex nature of the problem space.
> And then after all that effort, why wouldn’t you just use tooling the way it was intended, and the way it’s used in the open source model, so you can partition your CI/CD without a bunch of hacks, and don’t run into bizarre scaling issues with your VCS.
Because the tooling sucks, as previously mentioned. Many changes require touching many repos, which means coordinating many pull requests and manually changing dependency manifest files and so on.
Ultimately, the "repo" concept is limiting. We need tooling that is aware of dependencies one way or another, and sadly all such open source tooling sucks whether it assumes the relevant code lives in a single repo or across many repos.
As someone who's more systems oriented, ideally projects are locked in to a specific versioned dependency, and nothing changes unless a developer of a project explicitly asks for it.
What I've seen is the opposite, someone owns a dependency and is lazy and and wants to perform a breaking operation, and rather than version the change or orchestrate a backwards compatible change, they use mono-repos to "solve" the problem. IMHO it's a bad pattern and leads to a lot of risk.
This needs to happen periodically, when we have slack. Doing it continuously adds risks that aren’t really our job to take.
Since leaving Amazon, I've mostly worked with monorepos and wouldn't go back to multirepos without Brazil-style tooling.
I'm building a build system and a VCS (separately). I want to do it right.
Could you explain to me what Brazil is? Is it the build system? [1] Or is it the VCS that Amazon uses?
If it is the build system, then it appears that versionsets are literally just a list of dependencies with their versions to use for a build. Is that correct? If not, or if you can give me more detail, what are versionsets, exactly?
Also, what are workspaces? Does this quote from one of the comments on the link match?
> A workspace consisted of a version set to track and any packages that were checked out.
[1]: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6...
EDIT: Though while most teams have gone towards many smaller packages for their applications, I suspect that most would be better served by a team level monorepo. That gets you all of the benefits of monorepo locally and all of the benefits of manyrepo globally, and unless your project hits the ~200+ developer mark maintaining things will be stay tractable.
For context, I was for a very long time at FB so am definitely used to the monorepo way, and recently switched to place which uses github + many repos, and it feels so much worse.
Honest question - how do you actually effectively share code between many repos? Example: How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it? It should be a compile/buildtime error for the other project, but how does that work if everything is in its own little repository?
One way is: Each repo is a responsibility boundary and single source of truth, you use code from other repos the same as any other external dependency.
> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?
Changing an API breaks projects using it; you either do versioned APIs and/or coordinate changes with downstream consumers, the same as you would with an API with external customers.
(Another way is “downstream projects checkout their dependencies and build against them as a routine part of their process.“)
Locally, you can use an Amazon-internal tool to check out multiple repos and make changes to all of them. The tooling calls this a "workspace," but it feels very much like working in a monorepo since building and testing can happen at the workspace level.
> How do I know that me changing my backend app’s API doesn’t break any other project in the company potentially calling it?
In terms of change management, Amazon dependency graphs are managed as "version sets." Changes have to be built into a given version set, and that build will also rebuild any package in the version set that consumes the repository whose changes are being built in. (Usually, repositories are configured to build into one of the owning team's version sets on each commit to the primary branch.)
Not sure if it is a generic comment or a comment on TFA:
i) If the latter, I'm compelled to point out that TFA doesn't nearly advocate for monorepos as much as it lists reasons why a few SV companies use it, how they use it, and what they get out of it.
ii) If the former, then this blog post makes for a good read: https://tailscale.com/blog/modules-monoliths-and-microservic...
In fact, it makes it so easy to add new stuff that I didn't even realize we had 21 services til I counted. My first guess was 12.
Even with Dan's point about monorepos making tooling easier, if a VC tool had a good API, perhaps this point would be moot. Why is it hard to query files and repository dependencies? Should there be some way to model dependencies in your version control system? It'd be interesting to see someone tackle these problems in version control.
Ultimately the problem is that we need tooling which is aware of dependencies, and the repo abstraction isn't. Whether that code lives in a single repo or in many repos is fairly irrelevant, but keeping the code in a single repo is usually a fair bit easier for many things (especially when you're working in a single language since the language's build tools are usually well-suited for this basic case) and you don't need to manually update dependency manifest files, test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change.
There's often less tooling available for private repository hosts and private package feeds, but dependency management from a per-repo standpoint is if not a solved problem in practice, an easily solvable problem. (Github has some tools for private repos if you pay for them. Other systems can borrow from the same playbooks.)
(Other languages have similar dependency manifest files, most of which are similarly slurpable by easily automated tooling given the need/chance. Dependency discovery doesn't have to be a problem in multi-repository environments.)
> test how a given upstream change affects every downstream package, or coordinate half a dozen PRs for every change
Some of this is push versus pull questions. One developer needing to push a lot of changes is a lot of work for that one developer. Downstream "owners" needing to pull changes at a frequency is in some cases much tinier slices of work spread out/delegated over a larger team of people, many of whom may be closer to downstream projects to better fix and/or troubleshoot secondary impacts.
Monorepos make push easier, definitely. Sometimes pull is a better workflow. (Especially if you are using the same dependency tooling for third-party components. These days given CVEs and such you want a regular update cadence on third-party components anyway, using the same tools for first-party updates keeps more reason to keep that cadence regular. Lots of small changes over time rather than big upgrade processes all at once.)
All the big shops have multiple repositories. They all broke each one out grudgingly and under some kind of pressure.
The danger with mouthing off on HN is that this place is thick as thieves with people who actually do or did what the bloggers whinge on about.
Though in this case, the blogger has worked at all three of MS, Google, and Twitter, so I wouldn’t be quick to disregard him either.
On our multi-repos I have consistently seen dozens, if not hundreds, of stale pull requests and branches and issues piling up never to be merged. This compounds with a monorepo.
Additionally, how do you avoid doing pointless builds when new features are pushed? I can only imagine what the `.github` folder in a monorepo looks like.
For me it is similar to the "one large file" argument, and why I don't agree: obfuscation is bad, but information hiding is GOOD. When I open a file, I want the information relevant to the current domain I am working in, not all of the information all at once.
Similarly, when I open a github page, I want its issues, pull requests, branches, and wiki to represent the state of a single project. The one I am currently interested in. You lose this with a monorepo.
You can argue "well tooling can..." yes tooling that does not exist and that I do not want to implement. Similar to the "one large file" argument, editors are set up to manage many different files with tabs. You COULD just compile the code and navigate symbols, but that isn't the world we currently live in.
It's simple, with proper tooling, you know exactly the dependencies, so you know which test depend on the affected files and can run those tests, the rest shouldn't be impacted. And that tooling exists. It's not the one you may be using, but it exists, and not just in FAANG.
I don't actually understand this. You can do this with git submodules. It's just a directory structure. Can somebody please explain? If the problem is committing to multiple things at the same time for a point-in-time release, then the answer is tags. Rather than terabytes of git history for a gigantic organisation that has many unrelated projects.
A good example for you: Google releases the Google Ad Manager API externally periodically, with dated releases. How does having that in a huge monorepo make sense?
It's effectively just a pointer to a hash, and ends up being useless for versioning + a really nice footgun for tracking upstream updates.
The monorepo vs manyrepo tradeoff boils down to this:
Do you want more complicated build + deploy tooling or do you want more complicated dependency management?
If the former, pick monorepo. If the latter, pick manyrepo.
Edit: submodules IS a viable solution for truly third-party repos over which you have no control and don't expect to ever edit.
Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.
For example, the article states:
> [In the other direction,] Forcing dependees to update is actually another benefit of a monorepo.
What happens when the other teams that depend on your work don’t have the time/priority to update their code to work with the new version of your code? The ideal case that monorepo proponents tout is that the team updating the code that is depended on can just update the code for everyone who depends on them… however, that update is not always trivial and might require rework for code deeper inside the other teams projects. Maybe they are depending on functionality that is going away and it requires major work to change to the new functionality, and the team is working on other high priority things and can’t spend the time right now to update their code.
What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves? How does this work if there are dozens of teams that use the dependency? You cant have every team that creates core shared code be experts on every other team’s work. You can end up stuck waiting for every team to be ready to work on updating this dependency.
Imagine if this was how all dependencies in your code worked, and every build task used the latest release of every dependency regardless of major version bumps. You might wake up on a Tuesday and your build fails and now you have to spend a week updating your project to use the latest version. Multiply this by all the dependencies and your priority list is no longer your own, you are forced to spend your time fixing dependencies.
This is why we specify versions in our dependencies, so we can update on our own schedule.
Of course, the downside of this is now you have to support multiple versions of your code, which is the trade off and the problem a monorepo solves.
You are going to end up with downsides either way, the question is which is worse.
versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution".
Better to give the new incompatible behavior a new name. Deprecate the old name. Then callers of the old thing can fix on their own schedule.
The problem with these frequent monorepo discussion threads is that monorepos are at a significant disadvantage when it comes to good existing and available tools (especially open source ones), but most of the boosters work at companies that mostly use good existing and unavailable tools.
I've no problem with the discussion of course, and largely agree with the conceptual superiority in many cases, but on the practical side, the downsides are still significant and IMO overpowering. I've worked at insanely profitable medium sized companies that would use a monorepo if the tools were there, but instead used svn+externals and then git+a very simple script implementing essentially the same thing as svn:externals. The latter is a great option, IMO especially if you flatten all dependencies to the project/top level (i.e., all transitive dependencies specified and versioned at the top level), as you don't have the A->B->C problem where A using an updated C requires work from team C, B, A; you can just do C, A. It also discourages deeply nested dependencies, and bounds dependency count somewhat, and provides a very explicit and conscientious view of your total dependencies. Updates are also easy to partially automate.
> Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:
git clone -–depth [depth] [remote-url]For example, I've worked in a monorepo that was one giant binary, but I've also worked in a monorepo that was a single repo that contained 4 ish independent services ( but were all in a single git repo ).
It comes down to how efficient you can be with tooling. Thats the one thing that monorepos really do require, is a good upfront investment in tooling, and long term maintenance. However I've found the initial setup "cost" of setting up a complex monorepo with correct tooling is far outweighed by the simplified operative overhead of working inside it.
I mention this here, as maybe I'm missing some obvious solution.
However, my colleague explained that it's a bad idea because any config changes or accidental button presses on gitlab's ci/cd page can bring down or wipe out everybody's cluster. How can that problem be mitigated? It seems intrinsic to monorepo style.
The problem is with your deploy system. You can consider each of the clusters to be a service. Thus, a change in Service A (cluster A), should not trigger a deployment of Service B (cluster B).
My pipeline is split in 2:
1. on bitbucket, we run a pipeline that builds "build artefacts", docker images and "packaged" cloudformation templates.
Each of these artefacts has a list of triggers, either base docker images or source code. I'm building the relevant docker image or cf package based on the triggers (it's quite a naïve glob() use).
2. On aws side, I have something I call AWS Apps, in short a Stack Name, along with a set of triggers (the above build artefacts). On merge to main, I only deploy the AWS Apps affected by new build artefacts.
It's not ideal but it was handy to have access to all the cloud and application code on the embedded side for stuff like interface definitions for communication protocols and stuff like that. On the same company I worked on another project where the definition files for the cloud interface where on a different repo we had to use a submodule and I preferred the monorepo.
As other commenter said, we used bazel and there was indeed a smallish team that gave build support. Ramping up new hires on the build system was one of the more painful processes, I had to give support myself to teammates that had like a six month tenure only that they hadn't been there from day one.
Instead, everything is based around the idea that you check out the state of the world at some commit, do your build, whatever validation you need, and send it to production. You do this pretty often, ideally multiple times a week. Very occasionally you have an emergency where you want prod + cherrypick, and you generally build tooling that allows saying "build at this commit, but also with these later specific commits merged in".
What monorepos I've seen rarely bother with tags in practice, in part because they rarely individually version libraries, but at a technical level you can do it in git, if you need to.
With monorepo, I had to set up things once and go on my merry way. And I will be able to kick the monorepo-is-too-slow-can down the road for a few years from now.
The number of repos you have should roughly be equal to how many autonomous engineering "groups" you can divide into that work largely independent of other groups. Anything a group touches should probably be in the same repo as everything else that the same group touches.
There is lots of problems associated with ssds as well as large monorepos. There are more complicated than people realize but if you did google code jam it teaches them somewhat but needs to be explained too. There problem is stories sort of intersect with programming too. Clockwork with ssds needs to be reworked for google code jams. The problem is elixir sort of works with stories and programming. Predicate calculus and proof theories sort of are the only way programming will really make sense in a world full of ssds. Leveldb could be a more interesting problem for google code jams if it has some newer features too. Conflict resolution is tower of hanoi and that has problems with consensus algorithms and concat too.SSDs need to do derivatives for pieceing and parting software too and that is more interesting too.
If I remember correctly, this is how you do it:
1. Create a new empty repo for the monorepo
2. For each repo, 'git mv' all of the contents into a new directory with the repo's name
3. Add the repos to the mono repos as remotes
4. Run 'git merge --allow-unrelated-histories' for each repo
You will now have a monorepo with preserved history with each old repo existing inside of a sub-directory in the new monorepohttps://github.com/josh-project/josh
It's designed for making multiple Git repos from a monorepo, but I think you should be able to make a skeleton repo that represents your desired final monorepo layout and push your individual repos to the Josh subviews of that repo to combine them all.
(A big advantage of this approach over the multiple unrelated histories is that you don't have the mass move commits since Josh will rewrite all history as if the files were always in that folder, so you don't have to worry about history of individual files getting broken.)
I did a quick google and these instructions seem about right (without the delete step): https://gist.github.com/msrose/2feacb303035d11d2d05
I have never worked with mono repos, but I guess that this task would be somewhat easier, given that all sources are under a single repository.