You can share code without having to stand up infrastructure to host packages and whatnot.
You can separate concerns without introducing the infinite complexity of network io, queues etc. This is kind of a dig at microservices I guess, which have their place functionally (decoupled infrastructure, scaled independent of other services).
You can still deploy to separate targets, a code repository is not 1:1 with a deploy target, that is a fake constraint that never even existed.
Manyrepos ALWAYS end up being second class citizens. Test setup isn't as good as in the monorepo because that means duplicating it N times, and that is obviously wrong.
Common patterns are The Same But Different everywhere and/or there is crazy complexity in sharing code between repositories to alleviate this (often having its own problems)
It's just... all of that goes away with one/fewer code repositories. So... why? I'm not even anti micro-service, monorepo actually makes MORE sense with microservices IMO. Why do we do this?
Before someone points it out, I do recognize that a monorepo can still be poorly architected. We can all rest assured knowing that poor architecture is poor architecture whether it be monorepo, manyrepo, monolithic, microservice, PHP, Rust, blah blah.
Sure, tooling and configuration can mitigate a lot of that stuff, but most tools don't countenance codebases that are so large that most of the stuff in them is irrelevant to everyone. The natural thing to do is to split it up.
I'm personally against microservices; I think they go way too far the other way and tend to encourage some of the worst software development practices (NIH, code duplication, super weird architecture and viral explosion of dependency injection everywhere), but "we have one repo for everything" is also pretty weird. I mean, the most famous monorepos (Linux, OpenBSD, Google) literally invented tools to deal with them. That should say something.
You could setup your CI to only recompile what's changed, so you wouldn't be waiting for anything else than what you've changed. This usually requires a bit of work upfront, but once you've done it for 1 part of the codebase it is easy and low-maintenance to replicate to the entire codebase. With Gitlab CI (and others), you can import bits of yaml configuration here and there to avoid code duplication for such use cases.
> I'm personally against microservices
In some cases, you need microservices or at least being able to run only a single part of the monolith through configuration. For instance if you want to host parts of the codebase in a separate virtual machine for security or scalability.
I don't think the choice is a matter of opinion but a matter of technical/business requirements.
If you’re a team that has a client, several micro services, DB, etc it’s way better to have that under a single repo than spread to multiple. Monorepos don’t have to be gigantic monstrosities, they can encapsulated products.
That makes it very easy to roll out changes to any given service without breaking others and helps a lot with the backwards compatibility of services. Makes everything more resilient.
Shared code is always an internal dependency. You think 'We are sharing code and this is efficient'. But you end up having to take into account many different parts of the application when rolling out one change for you can easily break something totally out of sight while trying to change another.
I were mostly against duplication of code. But the more I develop and maintain larger systems, the better I see the value of separating codebases, even if this includes code duplication.
Assuming you get as far as actually having code in one repo, the advantages of monorepo do not come for free. You either use a consolidated build system or you're still linking code using packages. A mono-build is no small task especially if your org is of any sort of complexity.
You'll almost certainly never get away from packages entirely unless you want to pull in the source code of all your dependencies. Not only are you merging it in but you're integrating it into your mono-build system. Doesn't sound feasible or enjoyable to me.
Some people consider mono-repo a fools errand. At most places you can take the pragmatic approach of consolidating repos where it makes sense while keeping the decoupling advantages of packages where it makes sense. They both have trade offs.
It's still possible to have different jobs for different projects, just like before, with some kind of build filter in front of them. Those different build jobs can be managed however they are now. This is common. There is no need for a single giant build mechanism that knows all the things.
Packages are a separate concern. And of course you should use versioned packages, just like you always have. Why re-invent a solved problem? Trying to force library upgrades in n-many services all at once, automatically, is a hard problem. Why invent that, too?
Repo consolidation really shines for me in infrastructure and testing - the things that can touch multiple services at once. That's stuff most devs aren't really involved in day-to-day. I think that a lot of the monorepo hatred comes from not understanding other people's problems.
We are treating it as an experiment. Got 2 teams with 5/6 Devs in each sharing the same monorepo. We are using nx.dev as the build tool and it's going pretty well so far.
Different tech stacks too but using nx.dev thats been abstracted away. Allows us to share practices and we've built out the CI/CD and supporting infrastructure on AWS together which has certainly saved duplication of effort. Possibly one more team coming on board too.
If in the future it's not paying off we can always split. Doesn't need to be a forever decision, is how we are viewing it.
I am convinced that the monorepo (a single repository to store code for all of the organization's microservices) is something google concocted to keep their 250+ infra and devops engineers very busy.
/src
/lib
/mysql-common
/redis-common
/...
/services
/app-1
/app-2
/...
You can use other directories to group assets, database migrations, documentation - whatever. The core build, test, CI, etc. tools can live in the root.The ability to share common code and not have to worry about packaging internal libraries, versioning them, and rolling out updates n-many times is a game changer.
You try to handwave this away with "poorly architected is poorly architected" but... I don't think you can say that and say you don't get the backlash in the same sentence.
You describe that as "all those problems go away" but they don't go away for free! Nor do I agree that they're worth the cost of unfamiliar tooling in most cases, anyway.
They only go away if you get the architecture right, and the monorepo does not have inherent architectural guardrails against getting it wrong. That's the big difference between the hype and the reality.
You can easily get independently-deployed services sharing code in ugly tangled ways. Weird weird ugly crap done because service A just isn't ready to upgrade to the new version of library B even though service C needs it for a new feature, instead of just keeping service A on the previous version of a published artifact for longer. Sometimes that sort of thing is billed as a good thing - "force the team to update all their consumers before making breaking changes" - but in practice you get hacks and weird workarounds.
Weird compilation or runtime errors because this team wanted to use a different JVM language in this part than this other team did in that part, and the tooling got super confused.
"Just use bazel from day 1 and make sure separation of concerns is good" and so on and so on - sure, sure, sure. But now we're not talking monorepos, we're talking specific tooling too. And every layer of doing it differently you add is another chance to fuck up.
- CI/CD that runs on the entire thing
- merge policy that requires branch being up-to-date even for FF merges
- I once spent an entire day trying to merge a PR because CI for other people changes was taking N minutes, but mine was N + 1 minutes. I kid you not, I resigned a week after that day because I just couldn't.
- Messy organization within the repo
- People wanted tag releases, so every single service within monorepo was on the same version and some of had months since last activity
- Whole kitchen sink in one repo: terraform, Ansible, source code, fucking debian packages, secrets (sop) whatever else VPoE thinks is part of engineering.
- include over an include in pipeline files (this was GitLab CI) that is impossible to navigate because of name collisions.
You gripes about many-repos are IMO wrong, though. If you have to duplicate things, then you're doing it wrong.The fact that I can grep across services is a godsend.
I think we should all actively be fighting against Conway's law: "your code resembles your org structure". Multi-repos are usually a thin facade that basically end up supporting this and makes it harder to dev in and make the architecture typically worse.
I think the inverse of that, the org mirroring the code, happens as well. And maybe Conway really meant that, too.
I would argue that we should be organizing code in a way that we'd like our teams to be organized. We should use this as a tool for devs to self-organize. Eventually, management will see the cost savings in organizing the people similarly.
I work primarily on "cloud" stuff, so 95% of my world is Terraform and other (asynchronous) configuration-based stuff. I feel the pain more acutely because of the overhead imposed on my work that shouldn't exist, vs compiled software deployed for end-use-cases where there is an expected burden of build + test.
Because monorail, all of my changes go through the same tests and restrictions and review as executable code. We have to "pass the build" - and waste resources testing all of that executable code for every change. It's even worse being in a pseudo-regulated space (SOC2 / similar) since anything applied to any sub-part of the repo applies to everything.
You can argue that parts of the repo can use different processes, but then what's the benefit of keeping it all together vs having different repos with those different processes?
The only reason monorepos are better today is that git sucks at multi-repo (think submodules), and humans suck at separate repos. It might also be that nobody takes it far enough, where you have separate repos for everything, but that feels like the first two points both together; maybe tooling and/or a massive DevEx team could make that work. I'm pretty sure something would need to supplant git for subtree repos to work and make sense, until then the monorepo is the least bad option of all the bad options.
If it's a waste why is it even running those tests? That's it's own problem.
At Amazon, a small team of just 5 people might own around 100 repos. But they have tooling that makes that easier to organize and manage. Google has a ridiculous amount of code in a single repo, but they too have specialized tooling that makes that manageable.
Since you laid out the case for monorepos, I'll share some points about why one might prefer multirepo. These points mainly center around microservices, since you probably are monorepo by default if you have a monolith:
1. Tooling is more straight-forward for it out of the box. For a given service, create a repo from a template and you're live. You don't need to derp around with lerna or yarn workspaces and figure out how to make those work when some repos use maven or rubygems and cocoapods in addition to npm.
2. Better enforcement of requiring services to only communicate over API boundaries instead of code-share. Monorepos make violation of service boundaries too easy.
3. If you're doing a monolith, monorepos may be fine, but there are a ton of problems you run into if you use them for microservices. If you have one mega pipeline for all services, what happens if a single service fails to deploy or fails post-deployment CI checks? If you have multiple pipelines, what do you do when 4/12 pipelines fail? How do you track which commit each service is at in different stages? What happens if CI checks end up failing for some other service than the one you actually touched on your PR submission?
4. Less merge-conflict/out of date branch noise when developing.
5. Able to see which commits, PRs, issues, etc are associated with which service without needing to setup manual labor or build tooling to auto-label and tagging.
6. Possible to introduce fine-grained permissions on different repos/sections of the code. You can limit view permissions of top-secret projects, grant teams more ownership of their own repos, etc.
7. Fine-grained permissioning extends to automated tools. If you install a Github app to only one repo, it's limited to what it can do. This is a blast radius reduction you get only with multi-repo. With a mono-repo, you also have all Github secrets shared with the whole repo. If only one service should have access to certain secrets, you can't model that.
8. If you use git tags for service release annotations, it'd get very noisy to have tags for every service all in one monorepo.
9. If you want to generate automated release notes on deployment or library package publication, where can that go in a monorepo? The Github releases API gives you that for free, but if you're doing a monorepo without a monolith, you're going to have to find or build your own tooling here.
Because bad monorepos ala monolithic apps tend toward no separation of concerns, and with a large enough team everyone is stepping on each other's toes in conflicts, tests, etc. Some tech like Rails makes this even harder to enforce boundaries.
I get that a good team can manage a monorepo that's not one big ball of code, and a bad one wouldn't necessarily be better with microsystems either.
But it's super frustrating to hear teams blobbing everything together with no layering or separation and defending it as "but we're a monorepo"
I've found repository:deployment is a very useful way of organizing code. I work on a mix of cloud and on-prem apps, and maybe I'm biased from my personal preferences and experience but I see most of the deployment/release problems, confusing merge conflicts and build pain comes from products/repositories where this isn't the case.
I'm not quite sure what you mean by "targets" though, so I'll explain what I mean when I say "deployment", in the context of the mix of stuff I work on: the entire repository is released (and/or deployed) in a monolithic way.
This can be a microservice, an executable or installer (on-prem), a package (consumed by other repositories), or even a big monolithic service (eg, multiple web apps + proxy servers + terraform templates).
My reasoning behind this is that a deployment has to be standalone.
For example a microservice should be able to be deployed independently at any time without breaking anything that consumes it -- meaning you always have to ensure backwards compatibility. If you can't (or don't) do that -- which means you have to do coordinated deploys and deploy several things at the same instant -- you actually don't have microservices; you really have a monolithic architecture with all the complications/overhead of managing a microservice.
I think when you combine multiple microservices into one repository, it's too easy to break this backwards compatibility contract because from the source code point of view, those mixed versions never exist. My experience is that lots of developers seem to struggle with this conceptually and argue about it being unnecessary when it's raised during a PR.
If you don't want to support backwards compatibility between your "microservices" that's totally fine too: but IMHO you aren't doing microservices, and you shouldn't even design the ability to deploy them independently. When you do coordinated deploys and one fails, the rollback process is awful and you can have extended downtime. Instead it makes sense to have a monolithic deployment process (eg: single terraform file) that deploys them all together, and the easiest way to manage this is to have them in a single repository.
There's another challenge with on-prem software. Having a branch or tag for "Foo v1.1" that also contains the source for "Bar" that is at an unfinished state somewhere between v3.4 and v3.5, and likewise a "Bar v3.5" that contains the source for "Foo v-not-quite-1.2" is just nonsensical. Depending on the branching strategy and types of changes happening it also leads to the team that works on "Foo" fixing merge conflicts they don't understand from changes in "Bar", which are really easy to get wrong. So once again: If they're released independently, they should have their own repositories.
One note: in Render parlance, "services" includes static websites, so even for systems that wouldn't always be considered a monorepo in other contexts, this is useful to launch a static website alongside your code and, in so doing, more clearly communicate what you're doing to the next person to touch it. (Including six-months-from-now you.)
- Render is closest to what Heroku do, however being on their own hardware they can undercut Heroku who, by running on top of AWS, are both limited but apparently also don’t want to compete on price.
- Fly is both aiming to be a Heroku alternative but also have a really compelling use case of placing many smaller VMs closer to your customers. I believe they are heading towards a target of scale to zero on distributed VM regions for your app. They have some super clever ideas around distributed DB read replicas.
- Railway I think is somewhere in the middle, but I can’t understand their pricing… they say “you only pay for what you use” and the pricing page imply a that you can use fractional resources. It seems confusing as to if they have some sort of auto scale to zero or not. The docs are lacking on in the regard.
For me Fly I think are winning. Probably with Supabase or or Crunchy Data for the DB.
I think Fly.io employees are a lot more active on HN which is why you see them mentioned more often. They also write great blog posts which get a lot of attention here.
I also agree with the sibling comment. I came to Fly.io being sold on servers running close to the users. But I didn't see much of a peformance improvement with my web apps as they all have to communicate with Firebase. There's probably situations where this technology makes sense, but I haven't found a use case yet.
I am skeptical of edge compute being generalizable. I do forsee a bright future for it for embarrassingly parallel problem space, where data sharding can be done cleanly based on an end user.
Recommend it to anyone moving away from Heroku or looking for a cheap place to start.
If you need some hosting/deployment tool to support the way you structure your code repos, there is a bigger problem.
Your own tooling for your repo should decide what gets triggered. It's not even as simple as this article makes it out to be. If I edit a readme.md, should this redeploy because it's in the same directory?
Of course not. Structure your code how you want. Then make good tools to trigger other tools. don't restrict yourself to every downstream job type and tool integrating vertically with your specific structures.
The natural state of a mono-repo is Twitter-like paralysis, it requires concentrated work to avoid that but that work can make them better than many-repo.